Open gangagyatso4364 opened 2 months ago
the dalai lama training data is extracted here: s3://monlam.ai.stt/TTS_speakers/dalai_lama.csv
yash will migrate the workspace to US by end of day. Then we can start running the instance for model training.
Situ Rinpoche data being fed for transcribing in stt.pecha.tools
The comparison of model before and after lora fine tuning:
The Character Error Rate (CER): is a metric that measures the accuracy of a transcription by calculating the ratio of character-level substitutions, deletions, and insertions to the total number of characters in the reference text.
Description
We aim to enhance our speech-to-text (STT) model by fine-tuning it using exclusive speaker-specific data combined with our existing base training data. We will use Low-Rank Adaptation (LoRA), a method designed for efficient fine-tuning of large models with minimal computational overhead. This approach will enable the existing model to adapt effectively to the nuances of the speaker's voice while preserving the general knowledge acquired from the base data. The goal is to evaluate and compare the model's performance on the speaker’s test data before and after fine-tuning with LoRA, demonstrating the potential gains in accuracy and robustness.
Objective:
Completion Criteria
Implementation
subtask
Here’s a filled subtask list for your project on enhancing the speech-to-text model using LoRA:
Subtasks