Emrys365 / espnet

End-to-End Speech Processing Toolkit
https://espnet.github.io/espnet/
Apache License 2.0
7 stars 2 forks source link

add support for huggingface wav2vec2 model. Support ASR training and … #14

Closed simpleoier closed 2 years ago

simpleoier commented 2 years ago
add support for huggingface wav2vec2 model. Support ASR training and inference with default CTC. There is still a confusion. vocab file. The decoding result on an4 shows that "" in predictions, which should be " " (space) tokens. The temporary solution is to swap the and space token in stage 5 (token_list). Which results in a performance: dataset Snt Wrd Corr Sub Del Ins Err S.Err
inference_asr_model_valid.acc.ave/test 130 773 98.8 0.9 0.3 0.0 1.2 5.4
inference_asr_model_valid.acc.ave/train_dev 100 591 97.5 1.7 0.8 0.2 2.7 12.0
Emrys365 commented 2 years ago

Thanks @simpleoier !