Sharif-Wav2vec2

This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.

In this repository, an attempt was made to examine all aspects of the wav2vec2 model.

Datasets: for fine-tuning the Sharif-Wav2vec2-v1 model we've used: Mozilla Common Voice

The main datasets used for fine-tuning the Sharif-Wav2vec2-v2 model consist of BigFarsdat, DeepMine, FarsSpon & Mozilla Common Voice (AGP Dataset)
Corpus : Most of our textual data was taken from naab corpus which is a Huge corpora of textual data in Farsi
System Config: To fine-tune this model, NVIDIA GeForce RTX 3060-12 GB is used

Order of use:

Preprocessing
Fine-tuning
MakingLM
Test Model
client
Fine-tuned Model
- :hugs: You can find fine-tuned models at these addresses:

Several models were fine-tuned in this process, so this is the reason for the discrepancy between the code results. You insert your own route model. In order to make a fair comparison between the existing wav2vec2 models, we prepared a standard test set including various and appropriate data, which will soon be included with our paper.	Model	WER
m3hrdadfi/wav2vec2-large-xlsr-persian-v3	Mozilla_CommonVoice	no
m3hrdadfi/wav2vec2-large-xlsr-persian	Mozilla_CommonVoice	no
m3hrdadfi/wav2vec2-large-xlsr-persian-v2	Mozilla_CommonVoice	no
m3hrdadfi/wav2vec2-large-xlsr-persian-shemo	shEMO	no
wav2vec2-xlsr-multilingual-53-fa	Mozilla_CommonVoice+ Personal Data	no
Sharif-Wav2vec2-v1	Mozilla_CommonVoice	no
Sharif-Wav2vec2-v2	Mozilla_CommonVoice+ AGP Dataset	no
Sharif-Wav2vec2-v1	Mozilla_CommonVoice	yes
Sharif-Wav2vec2-v2	Mozilla_CommonVoice+ AGP Dataset	yes

Thanks to Sadra Sabouri for his collaboration:handshake::handshake:

Also, I would like to thank Mehrdad Farahani for his normalizer and dictionary :handshake:

:star:Give us a star if you found this repo useful.

🙋‍♀️ Open an issue if you have any comments about them.

:smiling_face_with_three_hearts: Feel free to open a pull request addding your feature. We'll be more than happy to accept them.