Vocab-Apps / anki-hyper-tts

HyperTTS Addon for Anki
GNU General Public License v3.0
73 stars 10 forks source link

[Feature Discussion] Realtime TTS flaws and improvement #99

Open popyoung opened 1 year ago

popyoung commented 1 year ago
  1. The "field" in Realtime TTS is invalid, while the real field depends on the TTS tag. In the code, I noticed, the source text of Realtime TTS is produced by card.question_av_tags (or answer_av_tags) which will filter some HTML tags. This behaviour is quite different from normal TTS, confusing users. I tried to change the code. After each time hypertts.anki_utils.extract_tts_tags(tts_tags) is called, I change the relevant varible to note[source.field_name].
  2. Lack of management of Realtime TTS. Maybe adding an editor just like normal TTS is a good idea.
luc-vocab commented 1 year ago

can you explain in detail what you mean with those points ? what do you call "normal TTS" ?

popyoung commented 1 year ago

I mean the mode when you just add TTS from presets in anki editor or browser. image

luc-vocab commented 1 year ago

I would need a lot more detail in order to fix whatever issue you're having, can you tell me what you're doing exactly, what's happening and what you expect to happen ? You could do a video.

popyoung commented 1 year ago

Sorry for coming late. I‘ve been using and thinking about this addon these days and I will consolidate these ideas with previous ones.

  1. One card could only have two realtime TTS templates, one for the front, another for the back. But I manoeuvred a way to have more, and I will explain it later.

  2. image As I refered, the "source field" in Realtime TTS is kind of invalid or at least meaningless.

image Because the text to be pronounced depends on the code of card templates. Anki will pass the text of the field we typed in this place to any TTS engines. Also I can't remember where the "source field" in Realtime TTS will be used in this addon's source code. In this card template, I use two realtime TTS templates in the front by modifying the card template.

  1. image We can know clearly that we have three normal TTS templates and edit them in the anki browser, but can't find out how many realtime TTS templates there are. For the same reason, edit a realtime TTS template is tricky. We have to select a card first, and then click the "Add Audio (realtime)" button.

  2. Here is my thoughts. The most important advantage of realtime TTS is convenient, users could save many operations because the audio is downloaded automatically. Nevertheless, if there is something wrong, we can't change the audio. For example, I have a realtime TTS template which uses Cambridge Dictionary to pronounce the field "Word" in UK. It worked great until the word "tune" came up. Somehow, Cambridge Dictionary doesn't have a UK pronunciation for the word "tune". So each time this card shows up, Anki will give me an error message. image


So maybe you should create a new type of TTS so that it can replace the normal TTS and the realtime TTS.

image First, users use some add button to create a new TTS template in the card template (like awesomeTTS? can't remember). image (image after they click the "add TTS" button) This is same as normal TTS. After users finish this dialog, the addon automatically add codes like {{tts en_GB hypertts_preset=preset_1 voices=HyperTTS:Example}} to the card template. The first time the card appears (there is no audio clip in target field), Anki will pass the text in the "Example" field to this addon. And the addon downloads audio automatically like realtime TTS does, but with a further operation: save this audio to target field. If there is an audio clip, just play this clip. So if an error occurs like the word "tune" I mentioned, I can download another audio clip and put it into target field to avoid the error. In addition, the addon could use raw text from the source field instead of filtered text from Anki. Because Anki will filter HTML tags like <i></i>. And it's more flexible to process raw text by users.

Thanks for reading and apologize for my English, which must be weird to some extent.

luc-vocab commented 1 year ago

I think what you are looking for is a more advanced realtime TTS mode, which can aggregate several fields together in a smart way. Currently HyperTTS uses the Anki TTS tag format, which has the benefit of being able to fallback to another voice if you don't have HyperTTS installed, or if you're reviewing on AnkiMobile / iOS (AnkiDroid will soon have support for it)

Do you only ever review on desktop ? If so, a more advanced realtime TTS mode may work for you. But it has yet to be developed, I will probably be working on it for a few months.

If you want to save yourself some headaches, I recommend using a real TTS service rather than a dictionary service, you'll get better output and there won't be an error on missing words.

Feel free to let me know how this advanced realtime TTS mode should work, seems like you have a lot of ideas, and you found a trick to have several realtime TTS tags (you can edit the addon's config to configure those manually by the way, it's in JSON format).