collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
3.54k stars 185 forks source link

Add (train) a new language #127

Open SunnyPage opened 3 months ago

SunnyPage commented 3 months ago

Hello, I want to add (train) a new language. But some stages cannot be completed. When I call: *parallel --eta -j16 python3 -m whisperspeech.vad_merge --eqvad {}::: .tar**

I get an error:

File "vad_merge.py", line 29, in split imax = len(s[ikey]) - 1 KeyError: 'vad.npy'

It is not clear what files should be in this directory: Vad or Emb? dev.tar, dev_vad.tar.gz, dev_emb.tar.gz, test.tar, test_vad.tar.gz, test_emb.tar.gz

When I swapped the places and file name. I got the error:

File "vad_merge.py", line 70, in merge_by_src_key ms["spk_emb.npy"].append(s["spk_emb.npy"]) KeyError: 'spk_emb.npy'

I'm confused, please help...

jpc commented 2 months ago

Hey, this is a bit confusing right now, sorry.

The current version of the scripts assumes that the shards will have -audio- in their names somewhere and this is going to get replaced by the various tags (-vad-, -spk_emb- etc.) for different derived data files. Unfortunately this is not tested and the code will keep overriding the same tar.gz files on each step and lead to confusing error messages.