Hi, I have 150 hours of clean audio at 44 kHz in wav format, featuring multiple male and female voices. I would like to train a pre-trained model from scratch.
Do I need to split the audio by different voices and include transcriptions, or can I train the pre-trained model directly without splitting and transcriptions?
Hi, I have 150 hours of clean audio at 44 kHz in wav format, featuring multiple male and female voices. I would like to train a pre-trained model from scratch. Do I need to split the audio by different voices and include transcriptions, or can I train the pre-trained model directly without splitting and transcriptions?
Thanks!