OpenPecha / stt-wav2vec2

MIT License
1 stars 0 forks source link

STT00056: Finetune with dalai lama stt dataset(MM24) #7

Open gangagyatso4364 opened 2 months ago

gangagyatso4364 commented 2 months ago

Description

We aim to enhance our speech-to-text (STT) model by fine-tuning it using exclusive speaker-specific data combined with our existing base training data. We will use Low-Rank Adaptation (LoRA), a method designed for efficient fine-tuning of large models with minimal computational overhead. This approach will enable the existing model to adapt effectively to the nuances of the speaker's voice while preserving the general knowledge acquired from the base data. The goal is to evaluate and compare the model's performance on the speaker’s test data before and after fine-tuning with LoRA, demonstrating the potential gains in accuracy and robustness.

Objective:

Completion Criteria

  1. A complete pipeline that incorporates LoRA for fine-tuning the model using speaker-specific data.
  2. Performance evaluation of the LoRA-fine-tuned model on speaker-specific test data, compared against the baseline model.
  3. Documentation of the potential improvements and scalability of the model with future acquisitions of speaker data.

Implementation

Image

subtask

Here’s a filled subtask list for your project on enhancing the speech-to-text model using LoRA:

Subtasks

gangagyatso4364 commented 2 months ago

the dalai lama training data is extracted here: s3://monlam.ai.stt/TTS_speakers/dalai_lama.csv

gangagyatso4364 commented 1 month ago

yash will migrate the workspace to US by end of day. Then we can start running the instance for model training.

gangagyatso4364 commented 1 month ago

Situ Rinpoche data being fed for transcribing in stt.pecha.tools

gangagyatso4364 commented 1 month ago

The comparison of model before and after lora fine tuning:

The Character Error Rate (CER): is a metric that measures the accuracy of a transcription by calculating the ratio of character-level substitutions, deletions, and insertions to the total number of characters in the reference text.

  1. Base Model CER on Dalai Lama test data: 10.36%
  2. Lora fine tuned Model CER on Dalai Lama test data: