Closed 736958408 closed 1 month ago
Hi,
Thanks for your interest in the project. We don't have the script for backtranslation present, but I can give you some high level guideline for backtranslation pipeline
Model Used: NLLB 600M Distilled Source Language: English Target Language: French
There was a pipeline to translate source texts to target language. Huggingface implementation has num_return_sequences, keep that a value greater than 1 (we used 5) and keep sampling parameters that promote diversity.
After this, we translate these back to source language, again with sampling parameters that promote diversity. Now, the reason for 5 sequences is that often, doing backtranslation results in same source text (looks like the translation models are getting better and better) and out of 5 we take the one which shows most diversity
Regarding TTS, yes we used Coqui-TTS.
Let us know if you have any further questions
Additionally, once you have successfully back-translated, you can use this file for TTS synthesis. I have tested this and it works as is!
Thank you for your attention and response. I have generated augmented text using NLLB as suggested. However, we encountered an issue while running Coqui-TTS. We found that the files config.json, language.json, speakers.json, and best_model.pth.tar are no longer available for download. Could you please provide the download links for these files? Thank you very much!
Hi @736958408 ,
Unfortunately, it looks like I don't have the checkpoints anymore. Additionally, you are right that the original YourTTS does not host the ckpts anymore.
However, have you tried searching in Coqui? You might be able to get it there:
https://github.com/coqui-ai/tts?tab=readme-ov-file#command-line-tts
tts --list_models
Please let me know if you see the model here! If not, I can look for an alternative!
Additionally, it is worth mentioning that MMER can perform almost as well as any other TTS+VC model like YourTTS. You can also use stronger alternatives to YourTTS if they are not available. The code I have provided remains the same for most models.
I am very pleased to discuss these issues with you, and I appreciate the help you have provided. Regarding TTS, indeed, the latest version has significant changes, making it relatively more challenging to resolve issues. I installed the latest version directly using pip install TTS, and it's important to note that Python version must be >= 3.9.0 to avoid dependency errors. For the model, I am using "vocoder_models--en--ljspeech--multiband-melgan" to generate audio by enhancing text. The data is currently being processed, and I will use MMER to evaluate the processed dataset later. Thank you for your guidance and attention. If I have any questions later, I will leave you a message. Thank you very much!
Dear Author,
This project is indeed excellent, but I have encountered some issues with data preprocessing and would like to ask for your advice. I want to train other datasets for this model, but I am unsure about the models you used for back-translation and text-to-speech (TTS). I saw an example of Coqui-TTS in the code for TTS. Did you use this model for text-to-speech? For back-translation, which model did you use to enhance the text? If possible, could you please provide the code for data preprocessing? Thank you very much. I look forward to your reply.