Call optimizer outside the loop in compute gradient on summed gradients

JesusEV / nest-simulator

The NEST simulator

GNU General Public License v2.0

1 stars 0 forks source link

Call optimizer outside the loop in compute gradient on summed gradients #18

Closed akorgor closed 2 months ago

akorgor commented 4 months ago

This PR aims to speed up the code by calling the optimized_weight function outside the loop in compute_gradient.

The following plots were obtained from running the eprop_supervised_classification_evidence-accumulation.py task (according to the new file naming, i.e., with the non _bsshslm_2020 models).

For a cutoff of 10, there is a small speed-up while the losses are accurate:

For a cutoff of 1000, there is a significant speed-up whilst still maintaining accurate losses:

JesusEV commented 4 months ago

@akorgor This is a strong optimization.

It would be even better not to pass the optimizer to compute gradient and perform the weight update from the send function of the synapse.

There is a small point of contention regarding the Adam optimizer.

Given that Adam maintains individual state variables for each parameter, updated according to the cumulative number of computations, the update frequencies of these variables may vary among synapses. Some synapses may exhibit higher firing rates, leading to more frequent updates of their state variables, while others may fire less frequently, resulting in less frequent updates.

I'm uncertain whether this aligns with the intended use of the Adam optimizer.

If this discrepancy in update frequencies among synapses does not lead to suboptimal convergence behavior or slow training I think we can consider this optimization for the case of Adam as well.

akorgor commented 3 months ago

Running simulations of eprop_supervised_classification_evidence-accumulation.py with

the corrected loss
a cutoff of 1000
the optimizer called from the send function of the synapse and not passed to the compute_gradient function

show a significant speed-up but with a worse prediction error. However, even the baseline training prediction error is not good yet, so we need to tweak the parameters.

akorgor commented 2 months ago

[x] The problem with this implementation is that the optimize_each_step flag is irrelevant for the _bsshslm_2020 model, but is still around in the weight optimizer dictionary. Is that a problem? Maybe remove from the tutorial dictionaries?
[x] Add somewhere a more detailed explanation of the two options with up- and downsides.

JesusEV commented 2 months ago

[ ] The problem with this implementation is that the optimize_each_step flag is irrelevant for the _bsshslm_2020 model, but is still around in the weight optimizer dictionary. Is that a problem? Maybe remove from the tutorial dictionaries?

[ ] Add somewhere a more detailed explanation of the two options with up- and downsides.

@akorgor Can the bsshslm_2020 models be modified to throw an error with the message, "This model does not support optimization at each step" or similar?

Perhaps this could be done from the check_connection function in eprop_synapse? (there is access to the optimizer there)