Open gangagyatso4364 opened 1 month ago
Currently the mms model by facebook can be studied here: https://github.com/facebookresearch/fairseq/blob/main/examples/mms/README.md#tts-1
script for finetuning mms : https://github.com/ylacombe/finetune-hf-vits/blob/main/README.md
currently facing issue with speaker ID in the pipeline with Gujurathi dataset, similar case for dalai lama dataset.
need to update TTS data with actual audio instead of url of audio in the dataset. add speaker id for different speaker ids.
ERROR:
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/finetune-hf-vits/run_vits_finetuning.py", line 1495, in <module>
main()
File "/home/ec2-user/SageMaker/finetune-hf-vits/run_vits_finetuning.py", line 1100, in main
speaker_id=batch["speaker_id"],
File "/home/ec2-user/SageMaker/finetune-hf-vits/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 270, in __getitem__
return self.data[item]
KeyError: 'speaker_id'
Currently the pipeline works on single speaker data when i erase the model speaker_id = batch['speaker_id'] line. but for multiple speakers it is not working. setting up the model config to multiple speakers.
need to fine tune the speed of the output audio from text.
Let me know if you need further clarification or adjustments!
For 912,122 training samples, here is the updated detailed estimation:
There is a issue i found after experimenting in space that my model is not able to generate audio for large text . i need to solve that issue.
Train the mms-tts-bod model on dolkar la and yangchen under same speaker id.
The result of experiment on multispeaker with different id has failed becuase the model learns from all the data but it is not able to differentiate between the speakers due to speaker id issue in model inference.
Description
We are going to fine-tune Meta's MMS (Massively Multilingual Speech) model for a Tibetan speaker named Sherab using Sherab's dataset. The process includes preparing Sherab’s data, uploading it to Hugging Face, fine-tuning the MMS model, and creating a Hugging Face Space to check the performance of the fine-tuned model. The selected Speakers:
You can test the model on hugging face space given here:
dolkar la and yangchen:
Completion Criteria
Implementation
Data Preparation for Sherab:
Upload Sherab’s Data to Hugging Face:
Fine-Tune the MMS TTS Model:
Create a Hugging Face Space for Model Performance Testing:
Subtasks
1. Data Preparation for Sherab:
2. Upload Sherab’s Data to Hugging Face:
3. Fine-Tune the MMS TTS Model:
4. Create Hugging Face Space for Performance Testing: