Do you use Cupy during the training?

fangwei123456 commented 2 years ago

Yes, because the default torch backend is too slow.

KoiLiu commented 2 years ago

OK, thank you a lot!

KoiLiu commented 2 years ago

Hello. I added backend='cupy' in the MultiStepParametricLIFNode and trained the smodels on the DVS128Gesture dataset. The result was only 57.99, which was much slower than that trained on the default torch backend. Could you tell me how to deal with it if I use the cupy incorrectly? The training args are as follows: Namespace(T=16, T_train=12, adam=False, amp=False, attention=None, batch_size=16, connect_f='ADD', data_path='./dataset', device='cuda:0', dist_url='env://', distributed=False, epochs=192, lr=0.001, lr_gamma=0.1, lr_step_size=64, model='SEWResNet', momentum=0.9, output_dir='./logs', print_freq=64, resume='', start_epoch=0, sync_bn=False, tb=True, test_only=False, weight_decay=0, workers=4, world_size=1) The max test acc is as follows: max_test_acc1 57.986111111111114 test_acc5_at_max_test_acc1 94.09722222222223 Thank you!

KoiLiu commented 2 years ago

By the way, when I just use the default torch backend, the max test acc could achieve 96.83%. It's around 97%.

fangwei123456 commented 2 years ago

That is unexpected because cupy and torch backends have the identical outputs and gradients.

KoiLiu commented 2 years ago

Oh. Ok, I will test it again and try to find out if I set a wrong parameter or something else.

fangwei123456 commented 2 years ago

You can check if w.grad of the PLIF neuron is identical when you use different backends.

fangwei123456 / Spike-Element-Wise-ResNet

Do you use Cupy during the training? #12