Hi @alexeib @patrickvonplaten ,
I have fine-tuned wav2vec2 models, specifically large-lv60, base, base-960h, and large-960, on Indian English data from four speakers. However, I am getting the empty or random transcripts after fine-tuning.
the below are the training arguments. I have used.
I have experimented with various learning rates, such as 1e-7 and 1e-9 . Additionally, I varied the dataset sizes, including 5k, 2k, 1k, and 15k data points; however, I consistently obtained the same results.
Hi @alexeib @patrickvonplaten , I have fine-tuned wav2vec2 models, specifically large-lv60, base, base-960h, and large-960, on Indian English data from four speakers. However, I am getting the empty or random transcripts after fine-tuning.
the below are the training arguments. I have used.
I have experimented with various learning rates, such as
1e-7
and1e-9
. Additionally, I varied the dataset sizes, including 5k, 2k, 1k, and 15k data points; however, I consistently obtained the same results.I have followed this finetuning code. https://huggingface.co/blog/fine-tune-wav2vec2-english.
Training Code for your reference. model_training.txt prepare_dataset.txt prepare_tokenizer.txt