Open roedoejet opened 1 month ago
The default speaker could systematically be set to "default"+lang_code, even with a single-speaker corpus? That would simplify the logic in the case of multiple single-speaker corpora.
The default speaker could systematically be set to "default"+lang_code, even with a single-speaker corpus? That would simplify the logic in the case of multiple single-speaker corpora.
true - even that might not be enough though. I mean, you could have two single speaker corpora from the same language. maybe default_<dataset_index>
?
Although, I find it very unlikely but there could be two or more data sets of different languages but with the same speaker Dealing with this case, I guess, will require additional question in the wizard
Dealing with this case, I guess, will require additional question in the wizard
Or we might say that such cases need to be manually configured by editing the config files afterwards.
Dealing with this case, I guess, will require additional question in the wizard
Or we might say that such cases need to be manually configured by editing the config files afterwards.
Config files do not include information with lists of speakers and languages. So, manual processing will be required for rows in text files which could be large.
In case when there are data sets of just one speaker, multispeaker option could easily be switched to false for sure. But if there are, for instance, three sets:
it could be problematic.
So maybe the wizard could add for a speaker name when there is no speaker column in the data, and offer "default"+lang by default but let the user change it as they wish?
This would parallel what we do for lang, where we demand a language code when there isn't a language column.
I think it would work. That's what I was thinking about, adding a question in the wizard about speakers.
We might use the wizard to combine multiple single-speaker, single-language datasets. In doing so,
multilingual
andmultispeaker
never get set toTrue
by default, but they should be.So, we need:
multilingual
should be set to truedefault
in two different languages - the speaker name should actually be set to something different. i.e. the speakerdefault
in a single-speaker English dataset is probably a different speaker thandefault
in a single-speaker Sinhala dataset.multispeaker
should be set to true