AwesomeTTS / awesometts-anki-addon

AwesomeTTS text-to-speech add-on for Anki
GNU General Public License v3.0
484 stars 100 forks source link

author suggested openai API key could be added to awesometts too? thanks #314

Open ccchan234 opened 9 months ago

ccchan234 commented 9 months ago

i deadly need it. please.

thankyou.

ps, i do not like hyper tts's UI, in which the sound 's source is not added to the resultant voice, making me unable to know which sound is from which sound voice.

thanks

MarcusXavierr commented 3 months ago

Yeah, I also prefer awesomeTTS. Sadly it doesn't support OpenAI which is the service I have an account.

@luc-vocab what should be done to add support for OpenAI service?

Just implement a file that respects some interface? I could do a PR

luc-vocab commented 3 months ago

I will add OpenAI to AwesomeTTS. But please answer the question first: what feature, if implemented in HyperTTS, will convince you to migrate to HyperTTS ? Over the coming years it will be difficult for me to maintain both, and HyperTTS is much easier to evolve because it's more modern.

MarcusXavierr commented 3 months ago

@luc-vocab I'm using HyperTTS right now. My only issue with it is that there's a bug when you generate audio for the flashcard. When I generate the audio for the text 'Hello, my name is Doug,' it only plays 'Doug' on the first attempt. I need to play the sound twice to hear the entire phrase.

luc-vocab commented 3 months ago

@MarcusXavierr are you using Bluetooth headphones?

ccchan234 commented 3 months ago

I will add OpenAI to AwesomeTTS. But please answer the question first: what feature, if implemented in HyperTTS, will convince you to migrate to HyperTTS ? Over the coming years it will be difficult for me to maintain both, and HyperTTS is much easier to evolve because it's more modern.

hi, for me, which model generated the sound should be stated in the sound file. e.g. in awoesomeTTS, the sound file is [sound:azure-886d8fe5-affa78d6-e7f29489-fed9aca8-fcf7c88f.mp3] [sound:googletts-47ced8f2-ed11e2af-67948cb1-da641227-01ce51e8.mp3] [sound:watson-6b76525a-ed448fb7-09f5f79c-723c4f0d-7f66ccc2.mp3]

so that i could listen to it days later, and i still know which sound is by which model, and i can RATE them.

with hyperTTS, afair, it didn't state the model e.g. azure/googletts/watson. so that i cant rate the models LATER.

thank you.

i always wonder, that new versions should be better to replace the old verions. otherwise it's called downgrade, not upgrade.

MarcusXavierr commented 3 months ago

@luc-vocab Yeah, I'm using Bluetooth headphones. But this issue doesn't happen with AwesomeTTS.

Another issue is that the add-on erases the back text when it generates audio. So if you press a shortcut to generate audio for the front card and type something into the back while the audio is being generated, when it finishes, it writes the audio tag on the front card, as it erases whatever you typed on the back.

luc-vocab commented 3 months ago

@MarcusXavierr can you confirm you're using batch audio in both cases, and the exact same service in both cases ? The problem with bluetooth is the audio fade-in that many headphones do and which suppresses the beginning of a word. One way to fix that is by introducing a pause at the beginning of the word: https://www.vocab.ai/tutorials/hypertts-tips-and-tricks#add-pauses

for your second issue, if you're simultaneously generating audio and touching the text in the target field, indeed that could lead to undesired behavior. Just curious why do you do this ? And what does your note type look like ?