Refactor/autodiff/track pnl

jdcpni commented 1 month ago

• composition.py: run(): add _update_results() as helper method than can be overidden (e.g., by autodiffcomposition) for less frequent updating

• autodiffcomposition.py:

autodiff_training() -> autodiff_forward(), and reorder calls for clarity of sequence
autodiff._update_learning_parameters -> do_optimzation():
- calculates loss for current trial
- calls autodiff_backward() to calculate gradients and update parameters
- updates tracked_loss over triasl
autodiff_backward() -> new method that is called from do_optimization that calculates and updates the gradients
self.loss -> self.loss_function
_update_results() - overriden to call pytoch_rep.retain_for_psyneulink(RUN:trial_output)
learn():
- move tracked_loss for each minibatch from parameter on autodiff to attribute on its pytorch_rep (since that is already context dependent, and avoids calls to pnl.parameters._set on every call to forward()
- synch_with_pnl_options: implement as dict to consolidate synch_projection_matrices_with_torch, synch_node_values_with_torch and synch_node_values_with_torch options passed to learning methods
- retain_in_pnl_options implement as dict to consolidate retain_torch_outputs_in_results, retain_torch_targets and retain_torch_losses passed to learning methods
_parse_synch_and_retain_args(): add to manage user assignments in run() and/or learn()

• pytorchwrappers.py

add/rename attributes:
- PytorchCompositionWrapper:
  - retained_outputs
  - retained_targets
  - retained_losses
  - _nodes_to_execute_after_gradient_calc
- PytorchMechanismWrapper:
  - value -> output
  - input
add methods:
- synch_with_psyneulink(): centralize copying of params and values to pnl using methods below
- copy_node_variables_to_psyneulink(): centralize updating of node (mech & comp) variables in PNL
- copy_node_values_to_psyneulink(): centralize updating of node (mech & comp) values in PNL
- copy_results_to_psyneulink(): centralize updating of autodiffcomposition.results
- retain_in_psyneulink(): centralize tracking of pytorch results in PNL using methods below
- retain_torch_outputs: keeps record of targets and copies to AutodiffComposition.pytorch_targets at end of call to learn()
- retain_torch_targets: keeps record of targets and copies to AutodiffComposition.pytorch_targets at end of call to learn()
- retain_torch_losses: keeps record of losses and copies to AutodiffComposition.pytorch_losses at end of call to learn()

• compositionrunner.py:

batch_inputs(): add calls to synch_with_psyneulink() and retain_in_psyneulink()
batch_function_inputs():
- needs calls to synch_with_psyneulink() and retain_in_psyneulink()

• pytorchemcompostionwrapper.py

store_memory(): implement single call to linalg over memory

coveralls commented 1 month ago

coverage: 83.917% (+0.1%) from 83.786% when pulling 93044590cd68bc957035ff10386ae5edb5afefac on refactor/autodiff/track_pnl into 310afb11542b21e1f056d6847d09000719a83fa2 on devel.

github-actions[bot] commented 1 month ago

This PR causes the following changes to the html docs (ubuntu-latest-3.11):

diff -r docs-base/AutodiffComposition.html docs-head/AutodiffComposition.html
327a328,339
> <p># 7/10/24 - FIX:
> .. _AutodiffComposition_PyTorch_LearningScale:</p>
> <blockquote>
> <div><dl>
> <dt>ADD DESCRIPTION OF HOW LearningScale SPECIFICATIONS MAP TO EXECUTOIN OF pytorch_rep:</dt><dd><p>OPTIMIZATION STEP:
> for AutodiffCompositions, this corresponds to a single call to <code class="xref any docutils literal notranslate"><span class="pre">foward()</span></code> and <code class="xref any docutils literal notranslate"><span class="pre">backward()</span></code></p>
> <blockquote>
> <div><p>methods of the Pytorch model</p>
> </div></blockquote>
> </dd>
> </dl>
> </div></blockquote>
436,438c448,467
< <em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">psyneulink.library.compositions.autodiffcomposition.</span></span><span class="sig-name descname"><span class="pre">AutodiffComposition</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">pathways</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">optimizer_type</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'sgd'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss_spec</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">Loss.MSE</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">learning_rate</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disable_learning</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">force_no_retain_graph</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">refresh_losses</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disable_cuda</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cuda_index</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'autodiff_composition'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition" title="Permalink to this definition">¶</a></dt>
< <dd><p>Subclass of <a class="reference internal" href="Composition.html"><span class="doc">Composition</span></a> that trains models using either LLVM compilation or <a class="reference external" href="https://pytorch.org">PyTorch</a>;
< see and <a class="reference internal" href="Composition.html#composition-class-reference"><span class="std std-ref">Composition</span></a> for additional arguments and attributes.</p>
---
> <em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">psyneulink.library.compositions.autodiffcomposition.</span></span><span class="sig-name descname"><span class="pre">AutodiffComposition</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">pathways</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">optimizer_type</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'sgd'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">loss_spec</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">Loss.MSE</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">learning_rate</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decay</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disable_learning</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">force_no_retain_graph</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">refresh_losses</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">synch_projection_matrices_with_torch</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'run'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">synch_node_variables_with_torch</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">synch_node_values_with_torch</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'run'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">synch_results_with_torch</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'run'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">retain_torch_trained_outputs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'minibatch'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">retain_torch_targets</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'minibatch'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">retain_torch_losses</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'minibatch'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">device</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disable_cuda</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cuda_index</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'autodiff_composition'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition" title="Permalink to this definition">¶</a></dt>
> <dd><dl class="simple">
> <dt>AutodiffComposition(                                optimizer_type=’sgd’,</dt><dd><p>loss_spec=Loss.MSE,
> weight_decay=0,
> learning_rate=0.001,
> disable_learning=False,
> synch_projection_matrices_with_torch=RUN,
> synch_node_variables_with_torch=None,
> synch_node_values_with_torch=RUN,
> synch_results_with_torch=RUN,
> retain_torch_trained_outputs=MINIBATCH,
> retain_torch_targets=MINIBATCH,
> retain_torch_losses=MINIBATCH,
> device=CPU
> )</p>
> </dd>
> </dl>
> <p>Subclass of <a class="reference internal" href="Composition.html"><span class="doc">Composition</span></a> that trains models using either LLVM compilation or <a class="reference external" href="https://pytorch.org">PyTorch</a>;
> see and <a class="reference internal" href="Composition.html#composition-class-reference"><span class="std std-ref">Composition</span></a> for additional arguments and attributes.  See <a class="reference internal" href="Composition.html"><span class="doc">Composition</span></a>
> for additional arguments to constructor.</p>
445,446c474,475
< <li><p><strong>learning_rate</strong> (<em>float : default 0.001</em>) – specifies the learning rate passed to the optimizer if none is specified in the <code class="xref any docutils literal notranslate"><span class="pre">learn</span></code> method of the AutodiffComposition
< (see <a class="reference internal" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.learning_rate" title="psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.learning_rate"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">learning_rate</span></code></a> for additional details).</p></li>
---
> <li><p><strong>learning_rate</strong> (<em>float : default 0.001</em>) – specifies the learning rate passed to the optimizer if none is specified in the <code class="xref any docutils literal notranslate"><span class="pre">learn</span></code> method of the AutodiffComposition;
> see <a class="reference internal" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.learning_rate" title="psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.learning_rate"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">learning_rate</span></code></a> for additional details.</p></li>
448c477,509
< <li><p><strong>device</strong> (<em>torch.device : default device-dependnet</em>) – specifies the device on which the model is run. If None, the device is set to ‘cuda’ if available,
---
> <li><p><strong>synch_projection_matrices_with_torch</strong> (<code class="xref any docutils literal notranslate"><span class="pre">LearningScale</span></code> : default RUN) – specifies the default for the AutodiffComposition for when to copy Pytorch parameters to PsyNeuLink
> <a class="reference internal" href="MappingProjection.html#psyneulink.core.components.projections.pathway.mappingprojection.MappingProjection.matrix" title="psyneulink.core.components.projections.pathway.mappingprojection.MappingProjection.matrix"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">Projection</span> <span class="pre">matrices</span></code></a> (connection weights), which can be overridden by specifying
> the <strong>synch_projection_matrices_with_torch</strong> argument in the <a class="reference internal" href="Composition.html#psyneulink.core.compositions.composition.Composition.learn" title="psyneulink.core.compositions.composition.Composition.learn"><code class="xref any py py-meth docutils literal notranslate"><span class="pre">learn</span></code></a> method;
> see <a class="reference internal" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_projection_matrices_with_torch" title="psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_projection_matrices_with_torch"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">synch_projection_matrices_with_torch</span></code></a>
> for additional details.</p></li>
> <li><p><strong>synch_node_variables_with_torch</strong> (<code class="xref any docutils literal notranslate"><span class="pre">LearningScale</span></code> : default None) – specifies the default for the AutodiffComposition for when to copy the current input to Pytorch nodes
> to the PsyNeuLink <a class="reference internal" href="Mechanism.html#psyneulink.core.components.mechanisms.mechanism.Mechanism_Base.value" title="psyneulink.core.components.mechanisms.mechanism.Mechanism_Base.value"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">variable</span></code></a> attribute of the corresponding PsyNeuLink <code class="xref any docutils literal notranslate"><span class="pre">nodes</span></code>, which can be overridden by specifying the <strong>synch_node_variables_with_torch</strong> argument
> in the <a class="reference internal" href="Composition.html#psyneulink.core.compositions.composition.Composition.learn" title="psyneulink.core.compositions.composition.Composition.learn"><code class="xref any py py-meth docutils literal notranslate"><span class="pre">learn</span></code></a> method; see <a class="reference internal" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_node_variables_with_torch" title="psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_node_variables_with_torch"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">synch_node_variables_with_torch</span></code></a> for additional details.</p></li>
> <li><p><strong>synch_node_values_with_torch</strong> (<code class="xref any docutils literal notranslate"><span class="pre">LearningScale</span></code> : default RUN) – specifies the default for the AutodiffComposition for when to copy the current output of Pytorch nodes to the
> PsyNeuLink <a class="reference internal" href="Mechanism.html#psyneulink.core.components.mechanisms.mechanism.Mechanism_Base.value" title="psyneulink.core.components.mechanisms.mechanism.Mechanism_Base.value"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">value</span></code></a> attribute of the corresponding PsyNeuLink <code class="xref any docutils literal notranslate"><span class="pre">nodes</span></code>,
> which can be overridden by specifying the <strong>synch_node_values_with_torch</strong> argument in the <a class="reference internal" href="Composition.html#psyneulink.core.compositions.composition.Composition.learn" title="psyneulink.core.compositions.composition.Composition.learn"><code class="xref any py py-meth docutils literal notranslate"><span class="pre">learn</span></code></a> method; see <a class="reference internal" href="#psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_node_values_with_torch" title="psyneulink.library.compositions.autodiffcomposition.AutodiffComposition.synch_node_values_with_torch"><code class="xref any py py-attr docutils literal notranslate"><span class="pre">synch_node_values_with_torch</span></code></a> for additional details.</p></li>
> <li><p><strong>synch_results_with_torch</strong> (<code class="xref any docutils literal notranslate"><span class="pre">LearningScale</span></code> : default RUN) – specifies the default for the AutodiffComposition for when to copy the outputs of the Pytorch model
> to the AutodiffComposition’s <a class="reference internal" href="Compositio
...

See CI logs for the full diff.

PrincetonUniversity / PsyNeuLink

Refactor/autodiff/track pnl #3030