fmi-basel / latent-predictive-learning

Code to accompany our paper "The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks” bioRxiv 2022
MIT License
22 stars 5 forks source link

Surrogate derivative with adaptive firing threshold #5

Closed yilun-wu closed 9 months ago

yilun-wu commented 12 months ago

Dear authors, Regarding the surrogate derivative used in the LPL spiking rule: $f'(U_i)=\beta (1+\beta |U_i-v^{rest}|)^{-2}$ Since the neuron has an adaptive firing threshold $v_i$ (eq. 16), I am wondering why was $v^{rest}$ instead of $v_i$ used in the surrogate gradient calculation?

Also, in the implementation of adaptive threshold (https://github.com/fzenke/auryn/blob/6174a67a56074e8b72a94a763277131f62778713/src/auryn/IFGroup.cpp#L139), it seems that $v_i$ is hard reset to 100mV before being exponentially decayed down to $v^{rest}$ instead of jumping by 100mV as stated in the paper as well as in eq. 16.

fzenke commented 11 months ago

Dear Yilon, thanks for your questions.

Concerning the first part of your question: The fixed threshold was inherited from the SuperSpike code in which I used a different neuron model. Frankly, I did not think about updating it when I changed the neuron model for this study. You are right to ask because it would be more logical to use the moving threshold. However, the moving threshold in IFGroup mainly implements an absolute + a relative refractory period on the timescale of about ~2ms (the time it takes to decay below the expiatory reversal potential). So there should only be a noticeable difference for high firing rates > 50Hz. That said, the whole theory pertaining to surrogate gradients and the reset term (and by extension to refractoriness) is murky and unsatisfactorily. In the SuperSpike paper the neuron model I used has a 1 or 2ms absolute refractory period which we also ignored in the learning rule. The topic is rarely talked about, but usually surrogate gradients work better when the reset term (and by extension refractoriness) are ignored for gradient computation (see also Zenke & Vogels 2021, Fig5). In that sense it still isn't clear what is the "right way" to deal with it. I was just discussing the topic again yesterday with one my PhD students.

As for the second part of your question. I believe you found another mistake and I will add it to the erratum list. Thanks for reporting it. Very helpful!

yilun-wu commented 11 months ago

Thanks for addressing my questions. Indeed, the difference would only kick in under high firing rates. I think this also applies to the second issue: it won't matter if the threshold jumps by 100mV or set to 100mV, they would all be decayed to rest level long before the next firing.