Open HobisPL opened 9 months ago
I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.
I don't have more Polish data that is permissively licensed. One thing I am looking forward to is adding more languages – hopefully this would improve performance on all languages, like it did for Whisper.
Sure, I understand. Will you provide any instructions on how to do fine-tuning and what the TXT/CSV file should look like? Is this a standard format?
audio_file_name|text|speaker_name
Alternatively, should I create a Google Colab notebook for this?
I'm intrested in doing this for swedish i found some audiobooks I could use. But I would be interested in what kind of hardware it requires, expected time and so on. Are there any resources on this?
I am working writing down the full process for data preprocessing. It's a bit involved because we need to scale it for 1000s of hours but for smaller fine-tuning datasets someone should be able to put all of it into a single notebook with reasonable runtime.
If I want to add a new language to WhisperSpeech, will fine-tuning archive it? Also, did the audio of the dataset is limited to 1 speaker only? It's difficult to find a dataset with 1000 hours of length with only 1 speaker... If different speakers speak with 1 single language will it work?
@jpc
Any update on this? how can i fine-tuning if i have a chinese audio dataset?
@jpc Please also tell me the dataset requirement, as I mentioned above, thank you
Any updates?
Can you write more about training, how the dataset should look like, etc.? I see that you are from Poland, do you plan to add more Polish voices? Because the current model struggles with accents and style.