FreeLanguageTools / vocabsieve

Simple sentence mining tool for language learning
GNU General Public License v3.0
343 stars 25 forks source link

TTS integration for sentence mining from text #158

Open Mycheze opened 1 month ago

Mycheze commented 1 month ago

Is your feature request related to a problem? Please describe. When mining from pure text, there is no good way to get audio recordings for full sentences. Being able to input an API key to some (good) TTS service would make text based cards come to life.

Describe the solution you'd like My personal favorite is ElevenLabs since they have lots of great voices. In theory, I would be able to put in my API key and a voice ID, then (if enabled on importing or in card creation), every new card will get an mp3 file generated for the sentence. And put into Anki with the rest of the card.

Additional context I threw together a basic yet functional python script to show how it works.

import requests

XI_API_KEY = "thisismysecretAPIkey"
VOICE_ID = "SAeFbPY3UJ8JdNKiLdtE"
TEXT_TO_SPEAK = "This sentence can be ANYTHING in 29 different languages. Sentences only, though."
OUTPUT_PATH = "output.mp3"

tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"
headers = {
    "Accept": "application/json",
    "xi-api-key": XI_API_KEY,
    "Content-Type": "application/json"
}
data = {
    "text": TEXT_TO_SPEAK,
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.8,
        "style": 0.0,
        "use_speaker_boost": True
    }
}

response = requests.post(tts_url, json=data, headers=headers, stream=True)

with open(OUTPUT_PATH, 'wb') as f:
    for chunk in response.iter_content(chunk_size=4096):
        if chunk:
            f.write(chunk)

print(f"Finished writing audio file to {OUTPUT_PATH}")
1over137 commented 1 month ago

Any advantages over using something like AwesomeTTS/HyperTTS in Anki?

Mycheze commented 1 month ago

Integrated into the card creation. I don't use those cause it's too much of a distraction from my process. They're good workarounds, but having everything added card by card would be much smoother.

1over137 commented 3 weeks ago

I think they allow you to generate on-demand with the note type, no? https://www.vocab.ai/tutorials/awesometts-on-the-fly-tts

In general I think this is difficult to do in a way that would be usable (i.e. without pasting some json) for the options. If there are a few good services with ~same API I can consider it, but I'll need to decide on a way to present this in the UI.