Xingrun-Xing / SpikeLM

This is the implentation of our paper "SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms" in ICML 2024.
16 stars 0 forks source link

Great Work! #1

Open Lvchangze opened 3 months ago

Lvchangze commented 3 months ago

Dear authors,

Congratulations on your acceptance to ICML 2024! I am the first author of SpikeBERT, and I am very grateful for your citation of our work. I am also delighted to see the integration of SNNs with NLP, and the achievement of comparable results with ANNs in both NLU and NLG tasks, which I believe is of great significance.

I have read your work carefully and admire your design of the SNN language model from the intrinsic mechanism of SNNs. Previously, due to my lack of ability to consider the rationality of the spiking language model from the perspective of spiking neuron mechanisms, the architecture of the SpikeBERT (https://github.com/Lvchangze/SpikeBERT) and the self-implemented SpikingGPT (https://github.com/Lvchangze/SpikingGPT) that I proposed was only based on the Spikeformer (ICLR 2023). During my autoregressive pre-training process of SpikingGPT, I found that fully-ACs-operation SpikingGPT based on the Spikeformer (or Spike-driven Transformer (NeurIPS 2023), which I have also tried), is actually very difficult to learn effective semantic information from the large-scale autoregressive pre-training phase.

Although your article does not mention it, I believe you have applied SpikeLM in the large-scale autoregressive pre-training mode of like GPT-3 models, so I am curious about: Can SpikeLM still demonstrate the scaling law or emergent abilities of generative language models in large-scale autoregressive pre-training?

Thank you! I look forward to communicating with you!

Xingrun-Xing commented 3 months ago

Hi, Changze, sorry for the late reply. We haven't directly trained large-scale SpikeLM in decoder-only architexture because of limited GPUs. In this work, our largest model is the large-sized mBART (680M) in the encoder-decoder architecture. In general, both training from scratch and decoder-only architexture increase the training challenges for spiking models. Therefore, I think it would be better to train the ANN counterpart firstly, and then continue training with spiking neuronal dynamics. We alse appreciate your previous work on SpikeBERT and its impact on advancing the field of spike-driven natural language processing. My WeChat is 15137162936 for further communication.

Lvchangze commented 3 months ago

Thanks! My WeChat is 13967492189. I have added your WeChat~