Open sberryman opened 5 years ago
If my above assumptions are correct then do these numbers seem accurate when including the test set?
902 unique speakers 898 speakers with >= 12 utterances?
Based on no.16khz.0463-1.tar.gz
and no.16khz.0463-2.tar.gz
only, waiting on the remaining files to download. I'm only getting about 5-10 MB/sec download rate.
554 unique speakers all of them so far have at least 12 utterances.
Seems good! I have 857 speakers in Swedish but that might be because of no matching spl files for all the wavs. Also some paths ends up without an author name and those I delete. The 000001 wav is empty. It only contains background noise.
Take a look at https://github.com/resemble-ai/Resemblyzer/issues/9#issuecomment-531522928 where I did some tests using the default model which he trained to 1M steps on just over 9,000 mostly English speakers vs my model trained to 1.2M steps on 25,668 speakers of which most are still English (17,688). Honestly it doesn't look like either managed to do a good job clustering Swedish or Norwegian.
First, no such thing as shameful code! I've been writing software for over 20 years and nobody writes clean code on a regular basis, especially when experimenting!
Since I only need the voices for the encoder and not to train the TTS/vocoder I'm wondering if I can simplify this a bit. Since I don't know Swedish or Norwegian I may be asking some very simple questions.
r#######
directories. Something like0467 sv train 1/Stasjon4/060799/adb_0467/speech/scr0467/04/04670404/r4670304
Is my assumption correct that all wav files in that folder are the same speaker? So for the above path the speaker is Jan Malmros?r4670304
can be found in the following locations?If unique speakers are truly in unique folders on the tail it shouldn't be too hard to group all utterances (wav files) for each speaker just by using the r### folders as identifiers.