Closed Adibian closed 2 years ago
I figured out what I should do to fix this error:
x = x.unsqueeze(0)
But the synthesized speech is very noisy! you can listen to result here.
Thank you if anyone has experience using HiFi-GAN and say how to run it with the output obtained from the current project.
Your synthesizer is predicting spectrograms with a different scaling than the hifigan model expects. To fix this, you will need to retrain your model with properly scaled data. Replace your synthesizer/audio.py with this file and preprocess data again.
Thanks for your quick answer. Which parameter should change? Is it possible to train hifi-gan with this parameters instead of retraining synthesizer?
Another question is that is there any standard parameters for this task? I mean that if I retrain synthesizer with new parameters, then which vocoders can I use (except hifi-gan that you say)? I should train synthesizer for each vocoder specifically ?
Is it possible to train hifi-gan with this parameters instead of retraining synthesizer?
Yes. Here's a pretrained model for testing: https://github.com/raccoonML/hifigan-demo/releases/tag/MLRTVC-v1 No training code provided, unfortunately.
I mean that if I retrain synthesizer with new parameters, then which vocoders can I use (except hifi-gan that you say)?
Waveglow should work.
Your synthesizer is predicting spectrograms with a different scaling than the hifigan model expects. To fix this, you will need to retrain your model with properly scaled data. Replace your synthesizer/audio.py with this file and preprocess data again.
To retrain synthesizer I should just replace new audio.py file? It's not needed any changes in hparams.py or any other configuration in this project?
No updates to hparams.py are required. With the new audio.py, some settings no longer have an effect, like min_level_db
and ref_level_db
.
@raccoonML I retrained synthesizer after replacing new audio.py and after ~100k steps (~35 epochs) result is better but it is still noisy (this file is a sample result) and loss is about 0.64 and does not seem to change very much after this. What do you think is the problem?
Hello @raccoonML , sorry to bother you, I wanted to know if you found a solution for this. I hope you can answer me, thank you.
@manuel3265 Be more specific? If you're referring to this, it's not something I can help with since problems are often particular to the dataset used. For those types of problems, switch to LibriSpeech or LibriTTS train-clean-100/360 and see if it goes away. Since those datasets are known to work, it will help you determine whether it's a problem of your code or the training data.
@raccoonML sorry for not being specific, i was referring to the use of hifi-gan with this repository.
were you able to implement it? or did you manage to implement some other vocoder?
I have an open issue to integrate hifigan with the MLRTVC fork. It's on hold for reasons: 1) lackluster results with the hifigan pretrained models (here); 2) more cleanup required before I'd feel comfortable releasing the code, and 3) my supporters have me working on voice conversion instead of TTS.
If this is something you want to try, I would suggest integrating Nvidia tacotron2 with hifigan since those repos use the same mel scaling. Another possibility is to train a hifigan model on RTVC mels like this example.
Edit: Add to list of reasons: 4) perceived lack of interest in continued development of RTVC or MLRTVC.
Hi Thanks for this great project. I trained synthesizer using my data and used pretrained vocoder and the result was good. Now I want to use other vocoders. I cloned HiFi-GAN project and tried to run it with mel-spectrogram reached from synthesizer:
But I got this error:
Then I tried to use WaveGlow and I got same error:
Why result of synthesizer is two dimensional? I think that synthesizer should return mel-spectrogram and these vocoders use mel-spectrogram too, so using these vocoders should be possible but how? Thanks for any suggestion.