Closed rkchamp25 closed 4 months ago
If you want to finetune using the original tokenizer, yes you'll need to normalize all numbers to spoken words.
Changing tokenizer means you'll need a large amount of data to retrain the model, that is not suggested unless you have several thousand hours of speech to reach best results
If you want to finetune using the original tokenizer, yes you'll need to normalize all numbers to spoken words.
Changing tokenizer means you'll need a large amount of data to retrain the model, that is not suggested unless you have several thousand hours of speech to reach best results
How to use the original tokenizer? I also created a discussion for this.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Hi I want to fine tune "stt_en_fastconformer_hybrid_large_streaming_multi" on custom data. In my dataset I have things like "Vitamin B12", "Code: c12r5", "hb1ac" etc For these alphanumeric words:
If there is any other suggestion, please let me know. Thank You