In this repository, an attempt was made to examine all aspects of the wav2vec2 model.
Datasets: for fine-tuning the Sharif-Wav2vec2-v1 model we've used: Mozilla Common Voice
The main datasets used for fine-tuning the Sharif-Wav2vec2-v2 model consist of BigFarsdat, DeepMine, FarsSpon & Mozilla Common Voice (AGP Dataset)
Order of use:
Several models were fine-tuned in this process, so this is the reason for the discrepancy between the code results. You insert your own route model. In order to make a fair comparison between the existing wav2vec2 models, we prepared a standard test set including various and appropriate data, which will soon be included with our paper. | Model | WER | Dataset | LM |
---|---|---|---|---|
m3hrdadfi/wav2vec2-large-xlsr-persian-v3 | Mozilla_CommonVoice | no | ||
m3hrdadfi/wav2vec2-large-xlsr-persian | Mozilla_CommonVoice | no | ||
m3hrdadfi/wav2vec2-large-xlsr-persian-v2 | Mozilla_CommonVoice | no | ||
m3hrdadfi/wav2vec2-large-xlsr-persian-shemo | shEMO | no | ||
wav2vec2-xlsr-multilingual-53-fa | Mozilla_CommonVoice+ Personal Data | no | ||
Sharif-Wav2vec2-v1 | Mozilla_CommonVoice | no | ||
Sharif-Wav2vec2-v2 | Mozilla_CommonVoice+ AGP Dataset | no | ||
Sharif-Wav2vec2-v1 | Mozilla_CommonVoice | yes | ||
Sharif-Wav2vec2-v2 | Mozilla_CommonVoice+ AGP Dataset | yes |
Thanks to Sadra Sabouri for his collaboration:handshake::handshake:
Also, I would like to thank Mehrdad Farahani for his normalizer and dictionary :handshake:
:star:Give us a star if you found this repo useful.
🙋♀️ Open an issue if you have any comments about them.
:smiling_face_with_three_hearts: Feel free to open a pull request addding your feature. We'll be more than happy to accept them.