Is grad_scale necessary ?

Xingrun-Xing / SpikeLM

This is the implentation of our paper "SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms" in ICML 2024.

15 stars 0 forks source link

Appreciate your work!

I have some questions about the code in spiking.py

(1) Is grad_scale necessary? Because I found that if I eliminated grad_scale, my own model converged faster (without using AlphaInit).

(2) Is the ElasticBiSpiking method suitable for CV tasks such as classification and object detection?

(3) Should I always use ElasticBiSpiking with AlphaInit? Judging from your paper, I think the answer is no. If I want to use ElasticBiSpiking method without AlphaInit , should I remove grad_scale ? Because I guess grad_scale makes the gradient too small.

Thanks again for your work! This is my first post on GitHub.

Xingrun-Xing / SpikeLM

Is grad_scale necessary ? #3