Open nonmetal opened 2 years ago
I have met the same problem. Have you solved it?
Hi, I find the waveglow model generates nan tensors, which leads to the silent output. I fixed this issue by using fp32. You can try to remove .half() in load_waveglow and load_tacotron functions. Hope this can help you.
Hi, I find the waveglow model generates nan tensors, which leads to the silent output. I fixed this issue by using fp32. You can try to remove .half() in load_waveglow and load_tacotron functions. Hope this can help you.
That method completely works! Thanks a lot for solving my problem 👍👍
AXS
❓ Questions and Help
What is your question?
Hello, I'm currently repeating the tutorial, and struggling with a problem in which
examples/textless_nlp/gslm/tools/resynthesize_speech.py
is producing a file that is completely silent.I don't think that the problem is happening during WaveGlow(Vocoder) step, as mel-spectrogram from Tacotron2 (var mel in
/examples/textless_nlp/gslm/unit2speech/utils.py
) shows no output. Also, it seems like that there is no problem inkm.bin
as it produces different length of sound file depending on the input file length.I was not sure whether I'm having a dependency or package issue(such as CUDA), so I re-produced these steps with various environments. However both new environments using Anaconda(torch1.12.1+cuda11.3) and Google Colab(torch1.12.1+cuda10.1) showed the same result.
I'm attaching the input file, output file, and following mel-spectrogram output below. Do you have any assumption why the problem is happening?
Thanks a lot!
Code
Downloaded pre-trained models from repo (HuBERT-km200 in this example)
get a sample voice file (LJspeech for this example) 84-121123-0005.flac
in
resynthesize_speech.py
I added the code to plot mel-spectrogram:PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/tools/resynthesize_speech.py \ --feature_type 'hubert' \ --layer 6 \ --acoustic_model_path $DATA/hubert_base_ls960.pt \ --kmeans_model_path $DATA/km.bin \ --tts_model_path $DATA/tts_checkpoint_best.pt \ --code_dict_path $DATA/code_dict.txt \ --waveglow_path $DATA/waveglow_256channels_new.pt \ --max_decoder_steps 2000