Open gangagyatso4364 opened 3 hours ago
on test set of 105 samples (10% percent of total data): base model cer = 9.78 % finetuned model (checkpoint 21000) cer = 7.93% finetuned model (checkpoint 17500) cer = 7.97%
hf link to train data and finetuned model: model training data
training parameters: per_device_train_batch_size=8, # Smaller batch size to increase updates per epoch gradient_accumulation_steps=1, # Further accumulate gradients for effective batch size evaluation_strategy="steps", save_steps=500, # Save checkpoints more frequently due to limited data eval_steps=50, # Evaluate regularly to monitor overfitting logging_steps=50, learning_rate=1e-6, # Lower learning rate for finer adjustment on small data num_train_epochs=200, # Increased epochs to fully learn from the limited data save_total_limit=500, # Limit checkpoints to manage storage fp16=True, # Mixed precision for faster computation, if supported warmup_steps=100, # No warmup needed for this small dataset report_to=['wandb'], # Optional: log to WandB for tracking push_to_hub=False
vast ai instance:
Description
we need to train wav2vec2 model for specific speaker accent and compare the performance with the base model on test data of that particular speaker.
Completion Criteria
A model that is capable of transcribing situ rinpoche audios accurately.
Implementation
subtask