store audio generation metadata somewhere / being able to know which voice was used

luc-vocab commented 10 months ago

it would be useful to know which voice was used to generate a particular audio file. requested by David over email.

Please add optional support for tracking which service/voice was used for a given audio generation.

Use case: I have a preset with a large number of voices configured for random use. When studying I often find that a few of the generated files aren't acceptable. Actions then might be to remove a particular voice from the preset (e.g. not a dialect pronunciation that I am studying) or to modify the configuration for that voice (e.g. adjust volume). But the only method I currently have for identifying the voice is to go through each of the voices in use, using samples from the https://languagetools.anki.study/languages page and try to pick out the matching voice. This is painstaking and can also be unreliable.

Ideas for where to provide this information:

Embed it in the generated file name

Populate a dedicated field in the note

Populate a standard MP3 tag (e.g. append to comment field)

Create / populate a user defined MP3 tag

Thank you,

Danika-Dakika commented 7 months ago

I hope that you will consider adding this feature. It would be hugely helpful for me in narrowing down the list of possible voices to just the best ones.

I would advocate for it to be included in the Collection Audio filename, because that makes it all the more convenient to weed out audio already produced from a voice/service that turns out not to be adequate. The full metadata wouldn't be necessary, just the service and the descriptor of the specific voice -- Amazon-Lupe, Azure-Tomas, Google-Wavenet-A, etc.

[But if this is added in Tags instead, I hope that you would make them nested -- Amazon::Lupe, etc. -- for ease of use.]

luc-vocab commented 6 months ago

Also requested by Eric over email.

luc-vocab commented 1 month ago

@Danika-Dakika if I understand correctly, your workflow is : 1. generate audio with different voices. 2. after experimenting for a while, you want to replace or remove sound tags with voices which you don't like.

luc-vocab commented 1 month ago

Reported by taufanpr on reddit https://www.reddit.com/r/Anki/comments/1e4kw0y/comment/ldfncu1/?context=3

The reason:

I just want a display that looks simple, because I see those long mp3 filenames every day in my Android when I edit Notes. And that's a bit of a distraction for me from studying.

I use several TTS:

Sometimes I forget whether this voice is come from Naver, Clova, Azure or Google.

(I'm learning Korean, sometimes the pronunciation from Naver and Google is a little different).

By creating a beautiful output filename like:

[sound:hypertts-naver_female-front-watermelon.mp3]

will help us remember the words we are learning, because there is an emotional feeling for who says the word, whether from Google Voice or Naver voice, Azure, or other TTS service providers.

Clova itself has several Korean voices, with different dialects, such as:

Hyeri, Ara, Minsang, Dain, Inna Yoo, Sangjin Oh

By giving a filename according to who are pronoun the word, such as:

[sound:hypertts-clova-female-hyeri_front-watermelon.mp3]

will make it easier for us to remember the vocabulary .

Maybe, it's strange, but it works for me 🥰

Amosnomor commented 1 month ago

@Danika-Dakika if I understand correctly, your workflow is : 1. generate audio with different voices. 2. after experimenting for a while, you want to replace or remove sound tags with voices which you don't like.

Another use case that has come up recently is having a simple way for cleaning up obsolete Azure standard voices. As it is now, they show up one by one, now and then, based upon Anki study intervals. I know that if something is done here that it is too late to help this problem, but it would provide a tool for dealing with future versions of it.

Danika-Dakika commented 1 month ago

@luc-vocab

if I understand correctly, your workflow is : 1. generate audio with different voices. 2. after experimenting for a while, you want to replace or remove sound tags with voices which you don't like.

Pretty much. I can use this to improve things going forward. If I frequently find a voice unreliable for pronunciation, I want to take it out of my random list entirely. I face the same issue as David in the OP -- I would have to listen to samples trying to figure out which voice it is, but that's a lot of effort. I'm also hesitant to strike a voice for one error if it's great overall (because there aren't that many good voices for my language!), so I end up just leaving them there. I'm probably struggling with the same voices repeatedly, but I have no way of knowing.

How I'd really like to use this is --

When I periodically clear out my unused media through Check Media, I'll be able to scan the list of filenames and see which voices are popping up most -- meaning those are voices that I often end up removing for one reason or another.
When I decide to eliminate a voice from my set, I'll be able to search my notes for which ones are using it and take care of all of them at once.

If I can effectively eliminate voices, that will open me up to start trying new ones!

Vocab-Apps / anki-hyper-tts

store audio generation metadata somewhere / being able to know which voice was used #138