Rongjiehuang / Multi-Singer

PyTorch Implementation of Multi-Singer (ACM-MM'21)
MIT License
137 stars 21 forks source link

Nan errors when trainning #8

Open Robinatp opened 2 years ago

Robinatp commented 2 years ago

Hello, I met the following problems in the training, as follow:

2022-05-12 14:56:20,955 (train:487) INFO: (Steps: 1000) train/embed_loss = nan. 2022-05-12 14:56:20,955 (train:487) INFO: (Steps: 1000) train/spk_similariy = nan. 2022-05-12 14:56:20,955 (train:487) INFO: (Steps: 1000) train/spectral_convergence_loss = nan. 2022-05-12 14:56:20,955 (train:487) INFO: (Steps: 1000) train/log_stft_magnitude_loss = nan. 2022-05-12 14:56:20,956 (train:487) INFO: (Steps: 1000) train/generator_loss = nan.

Do you have any proposals for me?

Thx!

Rongjiehuang commented 2 years ago

Hi, it's weird and I haven't come across this issue. Could retrain the model using another machine solve it? :)

Robinatp commented 2 years ago

I find some problem,maybe you should update the code , as bellow:

encoder/audio.py:171: dBFS_change = target_dBFS - 10 * np.log10(np.mean(wav 2)+1e-8*) encoder/audio.py:180: dBFS_change = target_dBFS - 10 torch.log10(torch.mean(wav 2)+1e-8**)

Robinatp commented 2 years ago

However, there is another new problem!

2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/embed_loss = 0.0153. 2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/spk_similariy = nan. 2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/spectral_convergence_loss = 0.2280. 2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/log_stft_magnitude_loss = 0.7044. 2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/generator_loss = 0.9629. 2022-05-20 20:32:54,063 (train:512) INFO: (Steps: 3000) train/embed_loss = 0.0155. 2022-05-20 20:32:54,064 (train:512) INFO: (Steps: 3000) train/spk_similariy = nan. 2022-05-20 20:32:54,064 (train:512) INFO: (Steps: 3000) train/spectral_convergence_loss = 0.2294. 2022-05-20 20:32:54,064 (x2num:14) WARNING: NaN or Inf found in input tensor.2022-05-20 20:32:54,064 (train:512) INFO: (Steps: 3000) train/log_stft_magnitude_loss = 0.7057. 2022-05-20 20:32:54,064 (train:512) INFO: (Steps: 3000) train/generator_loss = 0.9660.

qiao131 commented 1 year ago

I'm facing the same situation as you, wondering if you have solved it. Thanks.