Open spsither opened 5 months ago
pushed spsither/mms_300_v1.630
to HF with CER 20.50%
pushed spsither/mms_300_v1.780
on HF. This beat the benchmark with CER 20.29%
wav2vec2 and BERT have the same number of parameters. So wav2vec2 base and BERT base are the same and wav2vec2 large and BERT large have the same number of parameters.
So wav2vec2 large and MMS_300m has the same number of parameters. mms_300
model.num_parameters() 315548395
wav2vec2_run10
model.num_parameters() 315548395
Pushed openpecha/tibetan_asr_mms300_v1 to HF
Started a new run with 771.30 hours of data
Evaluating the model at step 1190000
Description
Train facebook/wav2vec2-xls-r-300m since it uses an order of magnitude more pretraining audio data facebook/wav2vec2-large-xlsr-53 we have been using before.
Completion Criteria
Post the model on HuggingFace OpenPecha and measure the CER on the Benchmark dataset
Implementation Plan
Run the prepare_dataset in batches and combine the datasets. Update the training script and run the training job. Continue training if the machine fails for any reason. Evaluate the model afterward.
Subtasks