Closed Linghuxc closed 9 months ago
I'm sorry that I made mistakes in recent updates. Thank you for pointing out this problem. I have pushed a new readme file to guide the preprocess process and updated the corresponding codes.
Hi, I am trying to train fluentspeech with LibriTTS,.Could you tell me what subset of LibriTTS do you use?
I used train-clean-100 + train-clean-300.
Maybe train clean 100 and train clean 360?
Do you train one subset first and then train the next? Or do you mix two subsets together for training?
yes, train clean 100 and train clean 360. I just mixed two subsets together for training in the experiments of our paper. But in my later experiments, mixing three subsets together shows better zero-shot capabilities.
I have now pre-processed train_clean_100. Should I mix the two subsets again and preprocess them again or can I just process train_clean_360 on the current basis?
The code does not support dataset concatenation. Sorry that you need to mix the two subsets again and preprocess them again.
Ok, thanks for your reply!
You're welcome! If you have any more questions or need further assistance, feel free to ask.
Hi,
I'm training the model on libritts and vctk, Use this command CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/spec_denoiser.yaml --exp_name spec_denoiser --reset
I have a question.
I trained vctk using egs/spec_denoiser.yaml
, but should I train libritts using egs/spec_denoiser.yaml
or egs/spec_denoiser_libritts.yaml
?
In addition, I have a friendly reminder, you can update the readme, python data_gen/tts/run_mfa_train_aligh.sh
may need to change h
to n
, and then use bash to execute.
I'm sorry that the egs/spec_denoiser_libritts.yaml
is out-dated. You can use directly use egs/spec_denoiser.yaml
to train the model on libritts.
Thanks for your advice! I will update the readme.
Hi, I used the pre-trained model for reasoning and found that mfa_model.zip and mfa_dict.txt were missing. I downloaded the relevant models from the official mfa and created folders by myself to put them in.
However, the output shows:![image](https://github.com/Zain-Jiang/Speech-Editing-Toolkit/assets/128498268/8d47b9e9-5406-4e6c-a17f-03cd041ee3af)
Do I need to perform the data processing part first? After entering the following command:
show:
![image](https://github.com/Zain-Jiang/Speech-Editing-Toolkit/assets/128498268/ecda8a25-023f-4808-9519-77af2f54595b)
How should I solve this problem and I need help with!