cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

shuzeZHAO commented 5 years ago

Hi, anyone ever run into this cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when I train waernn in "RAW" mode.

Initialising Model...                                                                                                                                                   Trainable Parameters: 4.481M                                                        
Loading Weights: "checkpoints/ljspeech_mol.wavernn/latest_weights.pyt" 

+-------------+------------+--------+--------------+-----------+
|  Remaining  | Batch Size |   LR   | Sequence Len | GTA Train |
+-------------+------------+--------+--------------+-----------+
| 1000k Steps |     32     | 0.0001 |     1375     |   False   |
+-------------+------------+--------+--------------+-----------+

/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLC
riterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, lo
ng) [with T = float, AccumT = float]: block: [17,0,0], thread: [765,0,0] Assertion `
t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLC
riterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, lo
ng) [with T = float, AccumT = float]: block: [17,0,0], thread: [766,0,0] Assertion `
t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLC
riterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, lo
ng) [with T = float, AccumT = float]: block: [17,0,0], thread: [767,0,0] Assertion `
t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "train_wavernn.py", line 123, in <module>
    voc_train_loop(voc_model, loss_func, optimiser, train_set, test_set, lr, total_s
teps)
  File "train_wavernn.py", line 45, in voc_train_loop
    loss.backward()
  File "/home/szhao/tts-1/lib/python3.6/site-packages/torch/tensor.py", line 107, in
 backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/szhao/tts-1/lib/python3.6/site-packages/torch/autograd/__init__.py", l
ine 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

I don't have this issue for "MOL" mode or tacotron model.

shuzeZHAO commented 5 years ago

I also tried LJspeech data, and wavernn works fine. I guess it's somehow related to my data. Some sentences might be too long (I limited the .wav to 20s max).

shuzeZHAO commented 5 years ago

I just realized that RAW mode and MOL mode are different from preprocess.py. I was using the data generated from MOL mode to train RAW model. Closing this issue now.

fatchord / WaveRNN

cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #108