fine-tuning. - Githubissues

collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

https://collabora.github.io/WhisperSpeech/

MIT License

3.91k stars 210 forks source link

fine-tuning. #55

Open HobisPL opened 9 months ago

HobisPL commented 9 months ago

Can you write more about training, how the dataset should look like, etc.? I see that you are from Poland, do you plan to add more Polish voices? Because the current model struggles with accents and style.

jpc commented 9 months ago

I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.

HobisPL commented 9 months ago

I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.

Sure, I understand. Will you provide any instructions on how to do fine-tuning and what the TXT/CSV file should look like? Is this a standard format? audio_file_name|text|speaker_name Alternatively, should I create a Google Colab notebook for this?

stellanhaglund commented 9 months ago

I'm intrested in doing this for swedish i found some audiobooks I could use. But I would be interested in what kind of hardware it requires, expected time and so on. Are there any resources on this?

jpc commented 9 months ago

I am working writing down the full process for data preprocessing. It's a bit involved because we need to scale it for 1000s of hours but for smaller fine-tuning datasets someone should be able to put all of it into a single notebook with reasonable runtime.

Naozumi520 commented 8 months ago

If I want to add a new language to WhisperSpeech, will fine-tuning archive it? Also, did the audio of the dataset is limited to 1 speaker only? It's difficult to find a dataset with 1000 hours of length with only 1 speaker... If different speakers speak with 1 single language will it work?

twmht commented 7 months ago

@jpc

Any update on this? how can i fine-tuning if i have a chinese audio dataset?

Naozumi520 commented 7 months ago

@jpc Please also tell me the dataset requirement, as I mentioned above, thank you

yukiarimo commented 2 months ago

Any updates?