Hi, I'm interested in using your code but have a few questions. I saw you trained the model on 15 hours of audio from each speaker, how long did it take to train? Also your ReadMe has some commands for preparing the audio part of the dataset, how should we prepare the matching transcriptions of the audio? It would be really nice if there was some sort of "Quick Start" guide that shows exactly what commands to run and in what order, kind of like what is done here. Thank you so much!
Hi, I'm interested in using your code but have a few questions. I saw you trained the model on 15 hours of audio from each speaker, how long did it take to train? Also your ReadMe has some commands for preparing the audio part of the dataset, how should we prepare the matching transcriptions of the audio? It would be really nice if there was some sort of "Quick Start" guide that shows exactly what commands to run and in what order, kind of like what is done here. Thank you so much!