OpenPecha / stt-wav2vec2

MIT License
1 stars 0 forks source link

STT0070: STT wav2vec2 finetuning on situ rinpoche training data. #9

Open gangagyatso4364 opened 3 hours ago

gangagyatso4364 commented 3 hours ago

Description

we need to train wav2vec2 model for specific speaker accent and compare the performance with the base model on test data of that particular speaker.

Completion Criteria

A model that is capable of transcribing situ rinpoche audios accurately.

Implementation

  1. create train val and test split of situ rinpoche data.
  2. then train the wav2vec2 model for situ rinpoche by taking the previous checkpoint.
  3. evaluate the result of model performance on test set.
  4. compare the performance with the base model.
  5. subtask

gangagyatso4364 commented 3 hours ago

on test set of 105 samples (10% percent of total data): base model cer = 9.78 % finetuned model (checkpoint 21000) cer = 7.93% finetuned model (checkpoint 17500) cer = 7.97%

gangagyatso4364 commented 3 hours ago

hf link to train data and finetuned model: model training data

gangagyatso4364 commented 1 hour ago

training parameters: per_device_train_batch_size=8, # Smaller batch size to increase updates per epoch gradient_accumulation_steps=1, # Further accumulate gradients for effective batch size evaluation_strategy="steps", save_steps=500, # Save checkpoints more frequently due to limited data eval_steps=50, # Evaluate regularly to monitor overfitting logging_steps=50, learning_rate=1e-6, # Lower learning rate for finer adjustment on small data num_train_epochs=200, # Increased epochs to fully learn from the limited data save_total_limit=500, # Limit checkpoints to manage storage fp16=True, # Mixed precision for faster computation, if supported warmup_steps=100, # No warmup needed for this small dataset report_to=['wandb'], # Optional: log to WandB for tracking push_to_hub=False

vast ai instance: