Open Loydian opened 2 years ago
Do you use the latest version of SJ? And do you move the whole network to cuda?
Yes, I use the 0.0.0.10 version. And I am pretty sure the training script is right. When using other models or replace the act_layer with the ReLU, the code work well.
Replace nn.ReLU
to torch.nn.PReLU
and see if it raises an error.
No. I just try it. Everything works fine.
Before loss.backward
, print the PLIF's w.device and see if it is cuda:1
.
I have done this before. The parameter w's device is cuda:1
.
residual_function.1.1.w : torch.Size([]) cuda:1
shortcut.1.1.w : torch.Size([]) cuda:1
OK. Can you provide a minimul codes example to reproduce the error? You can remove your proposed model in these codes to avoid the intellectual property rights.
When proposing this issue, I have tried to provide a minimul code. But the rest of the model is a new architecture. After removing the proposed block, the problem somehow disappear. And the orginal code of SEW resnet can work well with the same training script. I compared the implementation of the original SEW resnet and my convolution part, and didn't found where the bug is.
When Implementing the SEW block in my own model, I met a problem as blow.
When debuging it, I found the bug is here.
When I use the spiking neuron, the error occurs. But when I replace it with a ReLU, it works well. And the training script works well with other ANN models. Because of the problem of intellectual property rights, I can't show the complete code. After reading the source code of spikingjelly, I still can't fix the bug.