Closed FieldsMedal closed 1 year ago
Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)
Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?
I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43
Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)
hotwords weight and ngram file:https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models. decoding results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results.
Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?
Hotwords result on speech_asr_aishell1_hotwords_testsets. | model (FP16) | Latency (s) | CER | Recall | Precision | F1-score |
---|---|---|---|---|---|---|
offline model w/o hotwords | 5.5921 | 13.85 | 0.27 | 0.99 | 0.43 | |
offline model w/ hotwords | 5.6401 | 12.16 | 0.45 | 0.97 | 0.62 |
I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43
Hotwords result on AISHELL-1 Test dataset | model (FP16) | RTF | CER |
---|---|---|---|
offline model w/o hotwords | 0.00437 | 4.6805 | |
offline model w/ hotwords | 0.00435 | 4.5831 | |
streaming model w/o hotwords | 0.01231 | 5.2777 | |
streaming model w/ hotwords | 0.01142 | 5.1926 |
Thanks. Would you mind uploading the decoding results w w/o hotwords somewhere? (Maybe a huggingface repo for hotwords weight, ngram file, decoding results and other essentials is a good choice.)
hotwords weight and ngram file:https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/models. decoding results: https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/tree/main/results.
- The current order of ngram is 4, only support length <= 4 hotwords, if you want to configure longer hotwords, you can use higher order ngram, but at the same time will increase the decoding time.
Also, https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary here, they use F1 score, recall, precision to evaluate hot words. Can we get this stats?
Hotwords result on speech_asr_aishell1_hotwords_testsets.
model (FP16) Latency (s) CER Recall Precision F1-score offline model w/o hotwords 5.5921 13.85 0.27 0.99 0.43 offline model w/ hotwords 5.6401 12.16 0.45 0.97 0.62
I am also interested about the general test set performance. Would you mind testing the normal aishell test set WER? https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/client.py#L37-L43
Hotwords result on AISHELL-1 Test dataset
model (FP16) RTF CER offline model w/o hotwords 0.00437 4.6805 offline model w/ hotwords 0.00435 4.5831 streaming model w/o hotwords 0.01231 5.2777 streaming model w/ hotwords 0.01142 5.1926
Tested ENV
- CPU:40 Core, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
- GPU:NVIDIA GeForce RTX 2080 Ti
Many thanks. The results look nice. I was wondering for both w/ or w/o hotwords case if we use this as the default external LM https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/init_kenlm.arpa.
Also, is the pretrained model from here https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/s0#u2-conformer-result ? Looks like WER with WFST Decoding + attention rescoring for offline and chunk16 are 4.4 & 4.75. Pure attention rescoring without any ngram are 4.63&5.05. Not sure the results look like if you use aishell train set as arpa. I thought they use this 3-gram arpa https://huggingface.co/yuekai/aishell1_tlg_essentials/blob/main/3-gram.unpruned.arpa here.
Many thanks. The results look nice. I was wondering for both w/ or w/o hotwords case if we use this as the default external LM https://huggingface.co/58AILab/wenet_u2pp_aishell1_with_hotwords/blob/main/models/init_kenlm.arpa.
Also, is the pretrained model from here https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/s0#u2-conformer-result ? Looks like WER with WFST Decoding + attention rescoring for offline and chunk16 are 4.4 & 4.75. Pure attention rescoring without any ngram are 4.63&5.05. Not sure the results look like if you use aishell train set as arpa. I thought they use this 3-gram arpa https://huggingface.co/yuekai/aishell1_tlg_essentials/blob/main/3-gram.unpruned.arpa here.
In the latest commit, we modify batch_hotwords_scorer to hotwords_scorer. If you have free time, please help to review this pr.
@FieldsMedal Thanks!
One more question:
The output (Test hotwords boosting with word-level language models during ctc prefix beam search) for test_zh.py
is
INFO:root:Test hotwords boosting with word-level language models during ctc prefix beam search
INFO:root:('', '一', '换', '一首', '极点晚', '几点啦', '极点', '几点', '', '几', '晚', '极')
Not sure if the above result is expected?
================= Update: Should be fine. It is the user's responsibility to ensure the vocabulary contains space_id.
Thanks again! Really great feature! @FieldsMedal
We tested hotwords on speech_asr_aishell1_hotwords_testsets.
Acoustic model: a small Conformer model for AIShell
Hotwords weight:hotwords.tar.gz
Test method: please refer to the readme of this repository(TODO)
Latency And CER
offline model: https://github.com/wenet-e2e/wenet/tree/main/runtime/gpu/model_repo
offline model with hotwords(TODO):
Decoding result
佟健