Closed akorgor closed 2 months ago
@akorgor This is a strong optimization.
It would be even better not to pass the optimizer to compute gradient
and perform the weight update from the send
function of the synapse.
There is a small point of contention regarding the Adam optimizer.
Given that Adam maintains individual state variables for each parameter, updated according to the cumulative number of computations, the update frequencies of these variables may vary among synapses. Some synapses may exhibit higher firing rates, leading to more frequent updates of their state variables, while others may fire less frequently, resulting in less frequent updates.
I'm uncertain whether this aligns with the intended use of the Adam optimizer.
If this discrepancy in update frequencies among synapses does not lead to suboptimal convergence behavior or slow training I think we can consider this optimization for the case of Adam as well.
Running simulations of eprop_supervised_classification_evidence-accumulation.py
with
compute_gradient
functionshow a significant speed-up but with a worse prediction error. However, even the baseline training prediction error is not good yet, so we need to tweak the parameters.
optimize_each_step
flag is irrelevant for the _bsshslm_2020
model, but is still around in the weight optimizer dictionary. Is that a problem? Maybe remove from the tutorial dictionaries?
- [ ] The problem with this implementation is that the
optimize_each_step
flag is irrelevant for the_bsshslm_2020
model, but is still around in the weight optimizer dictionary. Is that a problem? Maybe remove from the tutorial dictionaries?- [ ] Add somewhere a more detailed explanation of the two options with up- and downsides.
@akorgor Can the bsshslm_2020 models be modified to throw an error with the message, "This model does not support optimization at each step" or similar?
Perhaps this could be done from the check_connection
function in eprop_synapse
? (there is access to the optimizer there)
This PR aims to speed up the code by calling the
optimized_weight
function outside the loop incompute_gradient
.The following plots were obtained from running the
eprop_supervised_classification_evidence-accumulation.py
task (according to the new file naming, i.e., with the non_bsshslm_2020
models).For a cutoff of 10, there is a small speed-up while the losses are accurate:
For a cutoff of 1000, there is a significant speed-up whilst still maintaining accurate losses: