NaN or Inf found in input tensor

camjac251 commented 4 years ago

I've been having this error happen around the 1000~ epoch mark where I'll start seeing WARNING:root:NaN or Inf found in input tensor. with every iteration run on Colab.

Is this safe to ignore? I tried looking it up and it seems to be related to tensorboard but I was worried it might be causing a model collapse or something with training.

Here's a shortened log of what it errors with. A full log is attached below

FP16 Run: False
cuDNN Enabled: True
cuDNN Benchmark: True
Loss function defined
Model defined
Optimizer defined
Loaded checkpoint '/content/drive/My Drive/colab/waveglow/outdir/waveglow_current_model' (iteration 108000)
Checkpoint loaded
Dataloader defined
output directory /content/drive/My Drive/colab/waveglow/outdir
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:175: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
Epoch:: 10%
971/10000 [2:43:23<617:21:12, 246.15s/it]
Starting Epoch: 931 Iteration: 108001
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:178: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
07:51:35 108116: -4.826 -5.014 0.00010000LR 2.14s per iter: 100%
116/116 [24:30<00:00, 12.68s/it]
-------------------------------------------------------
Starting Epoch: 946 Iteration: 109741
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
08:52:35 109856: nan nan 0.00010000LR 2.10s per iter: 100%
116/116 [09:08<00:00, 4.73s/it]
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
-------------------------------------------------------------------------
WARNING:root:NaN or Inf found in input tensor.

Starting Epoch: 947 Iteration: 109857
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
08:56:39 109972: nan nan 0.00010000LR 2.10s per iter: 100%
116/116 [05:04<00:00, 2.62s/it]
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.

colab-full-error-log.txt

rafaelvalle commented 4 years ago

Make sure your audio samples are larger than sample_length

egaebel commented 4 years ago

Just to be 100% clear, do you mean segment_length?

rafaelvalle commented 4 years ago

Yes

mataym commented 4 years ago

Make sure your audio samples are larger than sample_length

hi @rafaelvalle , can u tell me what exactly sample_length mean is? i wirted a func that get the parameters of wav file as follows? def getInfoWavFile(wfile): f = wave.open(wfile) params = f.getparams() Channels = f.getnchannels() SampleRate = f.getframerate() bit_type = f.getsampwidth() * 8 frames = f.getnframes() Duration = wav_time = frames / float(SampleRate)
return params, Channels, SampleRate, bit_type, frames, Duration in this function which parameter should be >= sample_length(in my config sample_length=16000)?thanks

jameszampa commented 1 year ago

hey @mataym this is how I got the segment length information for my dataset

import wave
import contextlib
import os

min_length = 9999999

for file in os.listdir('data'):
    with contextlib.closing(wave.open(os.path.join('data', file),'r')) as f: 
        frames = f.getnframes()
        #rate = f.getframerate()
        #length = frames / float(rate)    
        print(frames)
        if frames < min_length:
            min_length = frames
print()
print(min_length)

NVIDIA / waveglow

NaN or Inf found in input tensor #223