Some params have grad=None during training

Hi,

Thank you very much for this repo - I'm trying to train this model from scratch on some Saxophone recordings.

Firstly, I was getting weird errors for

mono instead of stereo wav files
24 bit instead of 16 bit wav files
midi with overlapping notes at the same pitch

It might be worth mentioning these in the README for people who want to train on something other than Maestro.

The error I'm now encountering is during the first epoch

epoch:0 progress:0.000 step:0  loss:5907.2900 gradNorm:12.11 clipValue:28.85 time:0.39
epoch:0 progress:0.000 step:0  loss:5911.5234 gradNorm:12.17 clipValue:23.27 time:0.38
Warning: detected parameter with no gradient that requires gradient:
torch.Size([90, 256])
pitchEmbedding.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
velocityPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 512])
velocityPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
velocityPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
velocityPredictor.6.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
refinedOFPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
refinedOFPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
refinedOFPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
refinedOFPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2, 128])
refinedOFPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2])
refinedOFPredictor.6.bias
Traceback (most recent call last):
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 364, in <module>
    train(0, 1, saved_filename, int(time.time()), args)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 199, in train
    average_gradients(model, totalLen, parallel)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/TrainUtil.py", line 45, in average_gradients
    param.grad.data /= c
AttributeError: 'NoneType' object has no attribute 'data'

It looks like many of the parameters don't have their gradients initialised. This is strange because at this point in the run it has completed a backward pass so I thought all the gradients should have been set. I'm using the following settings to train:

python3 -m transkun.train --nProcess 1 --batchSize 1 --hopSize 5 --chunkSize 10 --datasetPath "/import/research_c4dm/jxr01/bytedance_piano_transcription/filosax_train/" --datasetMetaFile_train "filosax_data/train.pickle" --datasetMetaFile_val "filosax_data/val.pickle" --augment checkpoint/filosax_model

Can you give me any tips on what to try next?

Yujia-Yan / Transkun

Some params have grad=None during training #17