ViktorAlm / Nasjonalbank-converter

Converts nasjonalbank 16khz dataset into libirispeech format
0 stars 0 forks source link

Thanks for sharing! #1

Open sberryman opened 5 years ago

sberryman commented 5 years ago

First, no such thing as shameful code! I've been writing software for over 20 years and nobody writes clean code on a regular basis, especially when experimenting!

Since I only need the voices for the encoder and not to train the TTS/vocoder I'm wondering if I can simplify this a bit. Since I don't know Swedish or Norwegian I may be asking some very simple questions.

  1. The wav files are nested VERY deep, to me it appears like each unique voice/speakers files are within the r####### directories. Something like 0467 sv train 1/Stasjon4/060799/adb_0467/speech/scr0467/04/04670404/r4670304 Is my assumption correct that all wav files in that folder are the same speaker? So for the above path the speaker is Jan Malmros?
  2. Looks like r4670304 can be found in the following locations?
    ./0467 sv train 1/Stasjon4/280799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/120899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/090899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/070799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/130899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/120799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/110899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/050799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/190799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/210799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/050899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/140799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/130799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/200799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/160799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/220799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/270799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/080799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/260799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/100899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/290799/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/060899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 1/Stasjon4/040899/adb_0467/data/scr0467/04/04670404/r4670304.spl
    ./0467 sv train 3/Stasjon19/121199/adb_0467/data/scr0467/19/04671904/r4670304.spl
    ./0467 sv train 3/Stasjon19/240100/adb_0467/data/scr0467/19/04671904/r4670304.spl

If unique speakers are truly in unique folders on the tail it shouldn't be too hard to group all utterances (wav files) for each speaker just by using the r### folders as identifiers.

sberryman commented 5 years ago

If my above assumptions are correct then do these numbers seem accurate when including the test set?

Swedish

902 unique speakers 898 speakers with >= 12 utterances?

Norwegian

Based on no.16khz.0463-1.tar.gz and no.16khz.0463-2.tar.gz only, waiting on the remaining files to download. I'm only getting about 5-10 MB/sec download rate.

554 unique speakers all of them so far have at least 12 utterances.

ViktorAlm commented 5 years ago
  1. Yes one speaker in the r### directories! Theres maybe 5 or 10 speakers that has two r folders that i've seen.
  2. Yes huge mess!

Seems good! I have 857 speakers in Swedish but that might be because of no matching spl files for all the wavs. Also some paths ends up without an author name and those I delete. The 000001 wav is empty. It only contains background noise.

sberryman commented 5 years ago

Take a look at https://github.com/resemble-ai/Resemblyzer/issues/9#issuecomment-531522928 where I did some tests using the default model which he trained to 1M steps on just over 9,000 mostly English speakers vs my model trained to 1.2M steps on 25,668 speakers of which most are still English (17,688). Honestly it doesn't look like either managed to do a good job clustering Swedish or Norwegian.