18-Add model-free kernel Aggressive Momentum NMD

sfohr commented 4 months ago

Add Aggressive Momentum NMD from Seraghiti et.al. (2023)

Closes #18.

Type of change

[ ] Bug fix (non-breaking change which fixes an unintended or erroneous behavior)
[x] New features (non-breaking change that adds functionality)
[ ] Breaking change (fix or feature that changes how existing functionality works, other than correcting errors)
[ ] Code improvement (no new functionality, but improving project structure)
[ ] Documentation only (changes that improve or expand instructions to users, without changing program behavior)

Motivation and Context

The change adds the Aggressive Momentum model-free kernel described in Seraghiti et. al. 2023 (their matlab implementation). It's an extension of the base model-free algorithm from Saul (2022) and extends the base algorithm in two ways:

Extrapolation of utility matrix Z and low-rank candidate L using a momentum term with parameter momentum_beta to accelerate convergence.
The momentum parameter is heuristically tuned conditional on the increase or decrease of the loss.

Description

This PR introduces the following changes:

Added kernel-specific arguments to AggressiveMomentumAdditionalParameters
- momentum_beta: initial magnitude of the momentum terms on Z and L, defaults to 0.7.
- momentum_upper_bound_increase_factor_gamma_bar: Factor increasing upper bound beta_bar when loss increases. Defaults to 1.05.
- momentum_increase_factor_gamma: Factor increasing momentum_beta if loss decreases. Defaults to 1.1.
- momentum_decrease_divisor_eta: Divisor decreasing momentum_beta if loss decreases. Defaults to 2.5.
- affected file: kernelInputTypes.py
Added a kernel-specific return type AggressiveMomentumModelFreeKernelReturnType
- Subclass of KernelReturnBase, adds nothing to base. affected file: kernelReturnTypes.py
Added AGGRESSIVE_MOMENTUM_MODEL_FREE kernel strategy to KernelStrategy
- affected file: enums.py
Implemented the kernel strategy in instantiate_kernel
- Added a type check on the kernel-specific parameter object before instantiating the kernel to calm pylint affected files: factory_util.py, test_factory_util.py
Added kernel-specific utility functions increase_momentum_beta, increase_momentum_upper_bound_beta_bar, decrease_momentum_beta and validate_hyperparameters
- increase_momentum_beta, increase_momentum_upper_bound_beta_bar, decrease_momentum_beta are trivial functions used to tune the momentum parameter. I moved them to file aggressive_momentum_model_free_util.pyto keep the class short.
- validate_hyperparameters ensures the hyperparameters satisfy: $\beta \in (0, 1)$ and $1.0 < \bar{\gamma} < \gamma < \eta$
- affected file aggressive_momentum_model_free_util.py
Added function reconstruct_X_from_L to model_free_util.py
- It reconstructs sparse matrix X by setting negative entries of L to zero.
- Used in kernel method compute_parameter_update_loss(): due to the momentum terms on Z and L the proxy loss $|| Z - L ||_F^2$ is not proportional to $|| X -max(0.0, L) ||_F^2$ anymore.
- For the moment I still use $|| Z - L ||_F^2$ for computation of the final loss at the end of each step for consistency, as the other algorithms also use it. Might be confusing to have algorithm-specific loss.
- Using $|| X -max(0.0, L) ||_F^2$ for all algorithms would work, but adds a small overhead to each step, what's your opinion on that?
- affected file: model_free_util.py (I just realized that the latent variable model implementations use the same loss, so it might be a good idea to move it to loss_util.py)
Added the kernel AggressiveMomentumModelFreeKernel
- Apart from the usual kernel class methods step(), running_report(), report() it has additional methods for parameter tuning
- increase_momentum_parameters()and decrease_momentum_parameters()
- accept_matrix_updates() and accept_matrix_updates()
- compute_parameter_update_loss()
- affected file: aggressive_momentum_model_free.py

Testing

Utility functions

Test the computations performed in the kernel.

increase_momentum_beta
- test_increase_momentum_beta_regular: increase momentum beta if upper bound is not reached
- test_increase_momentum_beta_upper_bound_reached: increase momentum beta if upper bound is reached
increase_momentum_upper_bound_beta_bar
- test_increase_momentum_upper_bound_beta_bar_regular: increase upper bound for beta if beta_bar is below 1.0
- test_increase_momentum_upper_bound_beta_bar_bound_reached: beta_bar is already at its upper bound 1.0 so do not increase further and return 1.0
decrease_momentum_beta
validate_hyperparameters
- test_validate_hyperparameters_valid_params: parameters valid, so no error should be raised
- test_validate_hyperparameters_invalid_momentum_beta_too_low: momentum_beta too low (-0.1), should raise an error
- test_validate_hyperparameters_invalid_momentum_beta_too_high: momentum_beta too high (1.5), should raise an error
- test_validate_hyperparameters_invalid_gamma_bar_too_low: momentum_upper_bound_increase_factor_gamma_bar too low (1.0), should raise an error
- test_validate_hyperparameters_invalid_gamma_bar_equal_to_gamma: gamma_bar equal too gamma (1.0), should raise an error
- test_validate_hyperparameters_invalid_gamma_bar_higher_than_gamma: gamma_bar (1.5) higher than gamma (1.2), should raise an error
- test_validate_hyperparameters_invalid_gamma_equal_to_eta: gamma equal to eta, should raise an error
- test_validate_hyperparameters_invalid_gamma_higher_to_eta: gamma higher than eta, should raise an error
test_reconstruct_X_from_L tests the ReLU function for the cases:
- all entries in L negative: return matrix of zeros
- all entries in L positive: return L
- all entries in L zero: return L
- mixture of positive and negative entries in L: set all negative values to zero

Kernel class utility methods

test_aggressive_momentum_kernel_instantiation: tests correct initialization of the kernel
test_aggressive_momentum_first_kernel_step: tests the logic of the first kernel step
test_compute_parameter_update_loss: tests correct assignment of parameter update loss (computation is tested in utility function tests)
test_aggressive_momentum_increase_momentum_parameters: tests correct assignment (computations are tested in utility function tests)
test_aggressive_momentum_decrease_momentum_parameters: tests correct assignment (computations are tested in utility function tests)
test_aggressive_momentum_accept_matrix_updates: tests correct assignment
test_aggressive_momentum_reject_matrix_updates: tests correct assignment
test_aggressive_momentum_parameter_adaption
- tests logic (computations are tested in utility tests) for cases:
- loss decreasing
- loss increasing
- loss constant
test_aggressive_momentum_running_report: same test as in the base model-free case
test_aggressive_momentum_final_report: same test as in the base model-free case

Integration Test

Added A-NMD to all_kernels_with_params with the default parameters.

Checklist

[x] My code follows the project code style
- [x] I've run the automatic formatters and made any needed changes
- [x] I've ensured that my new code is explained in docstrings/comments
[x] My submission requires a change to the documentation
- [x] I've updated the documentation to reflect the changes
[x] I've added unit tests that cover the visible behavior of the changes
[x] I've added or modified integration tests that ensure the whole system works on a practical problem after my changes

Closing thoughts

We should further discuss how we handle the loss computation mentioned earlier: we could continue using $|| Z - L ||_F^2$ as a loss function for algorithms that do not extrapolate Z and L (and document it appropriately) or we change the loss for all algorithms to $|| X -max(0.0, L) ||_F^2$.

sfohr commented 4 months ago

I guess we need to specify "numpy ^1.24" in pyproject.toml due to major version release.

jsoules commented 4 months ago

I guess we need to specify "numpy ^1.24" in pyproject.toml due to major version release.

Ah, yes! Glad you noticed this.

I pushed a small PR that should re-enable support for numpy versions up to 2.0. Feel free to merge that in and/or rebase (whatever your preference is).

I've refreshed my memory on the issues here and will be reviewing this PR today and tomorrow. Thanks for the extensive documentation, both in the PR and the original issue--it really helps to bring me back up to speed.

jsoules commented 4 months ago

We should further discuss how we handle the loss computation mentioned earlier: we could continue using ||Z−L||F2 as a loss function for algorithms that do not extrapolate Z and L (and document it appropriately) or we change the loss for all algorithms to ||X−max(0.0,L)||F2.

This is a great point which we should discuss further. Let me make sure I understand the issue. Because of the momentum terms, the low-rank candidate L at this point may have positive values in spots that are supposed to be 0--so the norm of the difference may be misleading. Is that correct?

I suspect that applying the max(0.0, L) is pretty cheap for any matrix of a size we can realistically handle, so I would be open to just converting the loss function. However, we'll want to profile it with the different algorithms to see how much of an impact it has. I am willing to accept a small performance hit for better rigor and consistency in the code, but let's try to quantify it before making a final decision.

jsoules commented 4 months ago

Thanks again @sfohr for your work on this issue (& package).

I think this is ready to merge on my end, so if you are satisfied with it, let me know and I'll approve the PR.

jsoules commented 3 months ago

Apologies for the delay in merging--I missed your response!

Nice work!

flatironinstitute / nomad