facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

Poor performance in Chinese #5300

Open zhhl9101 opened 1 year ago

zhhl9101 commented 1 year ago

❓ Questions and Help

Hello MMS,

I run test on cmn-script_simplified with MMS-1B-all model, got 44% WER, which is an unacceptable result.

Audio: NCYzUhAtZNI_0066.zip

What i want: "而且显得皮肤好白哟就这支颜色会显得你夏天的时候特别的有气质而且会很亮眼就是人群当中第一眼就会看到你" What i got: "而些显的皮复好摆奥这这纪人丝会显得你下天的收特别的有气质而且会很量演人取当中地影对看到你"

What changes do I need to make to make the result better ?

Thanks!

vineelpratap commented 1 year ago

Hi, I would recommend to run the decoding with Language Model to get better accuracy.

zhhl9101 commented 1 year ago

Hi, I would recommend to run the decoding with Language Model to get better accuracy.

Thanks, but I have not found Chinese LM in https://huggingface.co/facebook/mms-cclms/tree/main/lms. Does this mean I need to train the LM myself ?

vineelpratap commented 1 year ago

Looks like HuggingFace has a 50GB limit on the models. I'll upload the model on S3 and share the link here soon.

zhhl9101 commented 1 year ago

Looks like HuggingFace has a 50GB limit on the models. I'll upload the model on S3 and share the link here soon.

Exciting to hear that, thanks a lot !

If convenient, can you share example how to use LM model? ASR multiple audios in a 'for' loop by loading the model once is expected way instead of one audio with once loading, which is time-costing.

vineelpratap commented 1 year ago

Please see the instructions here on how to download the model and run them - https://huggingface.co/facebook/mms-cclms

zhhl9101 commented 1 year ago

Please see the instructions here on how to download the model and run them - https://huggingface.co/facebook/mms-cclms

Error when download cmn LM file: image

zhhl9101 commented 1 year ago

image

zhhl9101 commented 1 year ago

@vineelpratap Hello, do you have any ideas about error above: "This model has order 20 but KenLM was compiled to support up to 6." ? Thanks.

vineelpratap commented 1 year ago

Hi, you would have to rebuild the kenlm on your machine to support order 20.

zhhl9101 commented 1 year ago

Hi, you would have to rebuild the kenlm on your machine to support order 20.

Thanks, can you share guide about how to rebuild the kenlm ?

zhhl9101 commented 11 months ago

Hello @vineelpratap, Could you kindly share the rebuided LM directly ? or the way how to rebuild the kenlm ? For model users, rebuilding is not a comfortable way, and it is difficult to do this for me. I want to express my gratitude once again.