Closed tahafkh closed 8 months ago
You can check if torch.nn.functional.cross_entropy(output, label_onehot)` gets some nan values?
Hi, I check the input of the loss function, and it seems to be true. However, I made a new run without this technique and that run also crashed. This made me think that this error could be from memory, and by reducing the number of parameters in the network, I was able to run my model using this technique without any problems. Thanks!
Issue type
SpikingJelly version
0.0.0.0.15
Description
Hi! I was trying to train my SNN model using gradient accumulation technique, since I have memory limitation problem. However, during training, I got an error when backward method was called, which looks like this:
I'm using this link(the second method) and this link to implement this technique. Here is my code for training
I'm using batch size = 4 and accumulation steps = 8. Any ideas for the source of this error? or any ready-to-use recipe that you've used for your own projects?
Thanks for your help!