reward-based e-prop - Githubissues

ShuJz commented 4 years ago

https://github.com/IGITUGraz/eligibility_propagation/blob/7dd1558b5e2ff85b3b2b58e933cb2103a12f2a3b/Figure_4_and_5_ATARI/main.py#L235-L239

It seems the BPTT still be used here. How is e-prop used in this task? There is not an implementation of e-prop like you did here

https://github.com/IGITUGraz/eligibility_propagation/blob/7dd1558b5e2ff85b3b2b58e933cb2103a12f2a3b/Figure_3_and_S7_e_prop_tutorials/models.py#L382-L398.

franzscherr commented 4 years ago

Hi @ShuJz! Thanks for your interest.

We perform a trick with autodifferentiation, which is equivalent as hardcoding e-prop. Numerically, the trick is verified in numerical_verification.py and expresses itself in the example that you pointed out in https://github.com/IGITUGraz/eligibility_propagation/blob/7dd1558b5e2ff85b3b2b58e933cb2103a12f2a3b/Figure_4_and_5_ATARI/alif_eligibility_propagation.py#L125-L126 for the recurrent network, and for example in https://github.com/IGITUGraz/eligibility_propagation/blob/7dd1558b5e2ff85b3b2b58e933cb2103a12f2a3b/Figure_4_and_5_ATARI/spiking_agent.py#L124 for the spiking CNN

ShuJz commented 4 years ago

Thanks for your reply. I reviewed the code you mentioned, but still have a question with this code: https://github.com/IGITUGraz/eligibility_propagation/blob/7dd1558b5e2ff85b3b2b58e933cb2103a12f2a3b/numerical_verification_eprop_factorization_vs_BPTT.py#L83

when you calculate the gradient by using hard-code eprop, you used the tf.gradients() to get dE_dz, but the tf.gradients() is based on BP and you set stop_z_gradients=False, that means when you calculate the derivative of spikes the error still propagate through time as BPTT do.

I have also written a script to investigate that: https://github.com/ShuJz/eligibility_propagation/blob/master/test_eprop.py

here is the output of the script:

S1: EligALIF model BPTT vs. eprop-autodiff Maximum element wise errors(inputs): 0.0 Maximum element wise errors(target): 0.0 Maximum element wise errors(w_in): 0.0 Maximum element wise errors(w_rec): 0.0 Maximum element wise errors(w_out): 0.0 Maximum element wise errors(spikes): 0.0 Maximum element wise errors(out): 0.0 Maximum element wise errors(eligibility_traces): 0.0 Maximum element wise errors(learning_signals): 0.11414007097482681 Maximum element wise errors(gradients_hardcode): 1.9996122121810913 Maximum element wise errors(gradients_autodiff): 1.99961256980896 ###################################### S2: EligALIF model autodiff vs. hardcode BPTT vs. eprop-hardcode Maximum element wise errors(gradients BPTT vs. eprop-hardcode): 6.219636893185776e-14 eprop-autodiff vs. eprop-hardcode Maximum element wise errors(gradients eprop-autodiff vs. eprop-hardcode): 1.3024242871261665e-13 ###################################### S3: CustomALIF model BPTT vs. eprop-autodiff Maximum element wise errors(inputs): 0.0 Maximum element wise errors(target): 0.0 Maximum element wise errors(w_in): 0.0 Maximum element wise errors(w_rec): 0.0 Maximum element wise errors(w_out): 0.0 Maximum element wise errors(spikes): 0.0 Maximum element wise errors(out): 0.0 There is no eligibility_traces. There is no learning_signals. There is no gradients_hardcode. Maximum element wise errors(gradients_autodiff): 1.9996129274368286 ######################################

From this output:

from S1: EligALIF model BPTT vs. eprop-autodiff
- eligibility_traces is equal while learning_signals is different, that means the stop_z_gradients will affect the calculation of learning_signals when tf.gradients() is used.
- 'gradients_autodiff' and gradients_autodiff are different between BPTT and eprop-autodiff model, that is caused by the different route of error propagate.
from S3: CustomALIF model BPTT vs. eprop-autodiff
- gradients_autodiff are still different between BPTT and eprop-autodiff model, the BPTT and eprop-autodiff seems like not equivalently.

Since eprop still use BPTT (when calculate dE_dz), does eprop have lower complexity when compared to BPTT?

It seems the model CustomALIF has essentially difference when stop_z_gradients is set to False or True, will these two models (CustomALIF(stop_z_gradients=False), CustomALIF(stop_z_gradients=True)) really have similar performance?

guillaumeBellec commented 4 years ago

Yes, the script numerical_verification_eprop_factorization_vs_BPTT.py computes the exact same loss gradient as in BPTT.

This script is meant to be a numerical proof of equation (1) and this is not how we implement e-prop. Equation (1) use the exact learning signals dE/dz. This total derivative dE/dz can be computed with tf.gradients(...) and indeed, it does require to back propagate through time. This is why we approximate the learning signals in e-prop with the partial derivatives as written in equation (4).

Two implementation of the e-prop algorithm are implemented in the other script numerical_verification_eprop_hardcoded_vs_autodiff.py.

There, we replace the learning signals dE/dz with equation (4), this leads to the hardcoded implementation of e-prop. There is no tf.gradients(...) anymore. An alternative implementation of e-prop is to use auto-diff with the option stop_z_gradients=true. This second script shows that the two implementations are equivalent.

I think this is coherent with the outputs of your scripts, but let me know if I am wrong.

Since eprop still use BPTT (when calculate dE_dz), does eprop have lower complexity when compared to BPTT?

With a fully online implementation of e-prop, e-prop and BPTT require the same number of operations (up to a constant factor), but there is a bigger difference in terms of memory consumption. Roughly an online implementation of e-prop consumes less memory than BPTT when the number of neurons x number of time steps is larger than the number of network connections.

Since the simulation was fitting in our GPU memory, it was fine for our use cases to use the blocked-BPTT implementation of e-prop. I think e-prop would become even more interesting with sparse networks and an efficient online implementation.

It seems the model CustomALIF has essentially difference when stop_z_gradients is set to False or True, will these two models (CustomALIF(stop_z_gradients=False), CustomALIF(stop_z_gradients=True)) really have similar performance?

No, they will have slightly different performance and they are not equivalent.

Using auto-diff with CustomALIF(stop_z_gradients=False) implements exact BPTT. Using auto-diff with CustomALIF(stop_z_gradients=True) implements e-prop. There is a small gap in performance between BPTT and e-prop as we report in the paper in Figure 2C.

ShuJz commented 4 years ago

Thank you very much for your patient answers.

IGITUGraz / eligibility_propagation

reward-based e-prop #2