A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Hi,
Can I use Persian Multiple speaker audio datasets for fine-tuning QuartzNet15*5? Will be the WER low as I fine-tune it with a single speaker dataset?
Thanks
Environment overview
Environment location: Google Colab
Method of NeMo install: !pip install nemo_toolkit[asr]
The model might train fine, but I'm evaluation on other speakers will have poor WER. It can be tried, but for proper generalization more data from multiple speakers would be useful
Hi, Can I use Persian Multiple speaker audio datasets for fine-tuning QuartzNet15*5? Will be the WER low as I fine-tune it with a single speaker dataset? Thanks
Environment overview
Environment details