Closed sfohr closed 3 months ago
I guess we need to specify "numpy ^1.24" in pyproject.toml
due to major version release.
I guess we need to specify "numpy ^1.24" in
pyproject.toml
due to major version release.
Ah, yes! Glad you noticed this.
I pushed a small PR that should re-enable support for numpy versions up to 2.0. Feel free to merge that in and/or rebase (whatever your preference is).
I've refreshed my memory on the issues here and will be reviewing this PR today and tomorrow. Thanks for the extensive documentation, both in the PR and the original issue--it really helps to bring me back up to speed.
We should further discuss how we handle the loss computation mentioned earlier: we could continue using ||Z−L||F2 as a loss function for algorithms that do not extrapolate
Z
andL
(and document it appropriately) or we change the loss for all algorithms to ||X−max(0.0,L)||F2.
This is a great point which we should discuss further. Let me make sure I understand the issue. Because of the momentum terms, the low-rank candidate L at this point may have positive values in spots that are supposed to be 0--so the norm of the difference may be misleading. Is that correct?
I suspect that applying the max(0.0, L) is pretty cheap for any matrix of a size we can realistically handle, so I would be open to just converting the loss function. However, we'll want to profile it with the different algorithms to see how much of an impact it has. I am willing to accept a small performance hit for better rigor and consistency in the code, but let's try to quantify it before making a final decision.
Thanks again @sfohr for your work on this issue (& package).
I think this is ready to merge on my end, so if you are satisfied with it, let me know and I'll approve the PR.
Apologies for the delay in merging--I missed your response!
Nice work!
Add Aggressive Momentum NMD from Seraghiti et.al. (2023)
Closes #18.
Type of change
Motivation and Context
The change adds the Aggressive Momentum model-free kernel described in Seraghiti et. al. 2023 (their matlab implementation). It's an extension of the base model-free algorithm from Saul (2022) and extends the base algorithm in two ways:
Z
and low-rank candidateL
using a momentum term with parametermomentum_beta
to accelerate convergence.Description
This PR introduces the following changes:
Added kernel-specific arguments to
AggressiveMomentumAdditionalParameters
momentum_beta
: initial magnitude of the momentum terms onZ
andL
, defaults to0.7
.momentum_upper_bound_increase_factor_gamma_bar
: Factor increasing upper boundbeta_bar
when loss increases. Defaults to1.05
.momentum_increase_factor_gamma
: Factor increasingmomentum_beta
if loss decreases. Defaults to1.1
.momentum_decrease_divisor_eta
: Divisor decreasingmomentum_beta
if loss decreases. Defaults to2.5
.kernelInputTypes.py
Added a kernel-specific return type
AggressiveMomentumModelFreeKernelReturnType
KernelReturnBase
, adds nothing to base. affected file:kernelReturnTypes.py
Added
AGGRESSIVE_MOMENTUM_MODEL_FREE
kernel strategy toKernelStrategy
enums.py
Implemented the kernel strategy in
instantiate_kernel
factory_util.py
,test_factory_util.py
Added kernel-specific utility functions
increase_momentum_beta
,increase_momentum_upper_bound_beta_bar
,decrease_momentum_beta
andvalidate_hyperparameters
increase_momentum_beta
,increase_momentum_upper_bound_beta_bar
,decrease_momentum_beta
are trivial functions used to tune the momentum parameter. I moved them to fileaggressive_momentum_model_free_util.py
to keep the class short.validate_hyperparameters
ensures the hyperparameters satisfy: $\beta \in (0, 1)$ and $1.0 < \bar{\gamma} < \gamma < \eta$aggressive_momentum_model_free_util.py
Added function
reconstruct_X_from_L
tomodel_free_util.py
compute_parameter_update_loss()
: due to the momentum terms onZ
andL
the proxy loss $|| Z - L ||_F^2$ is not proportional to $|| X -max(0.0, L) ||_F^2$ anymore.model_free_util.py
(I just realized that the latent variable model implementations use the same loss, so it might be a good idea to move it toloss_util.py
)Added the kernel
AggressiveMomentumModelFreeKernel
step()
,running_report()
,report()
it has additional methods for parameter tuningincrease_momentum_parameters()
anddecrease_momentum_parameters()
accept_matrix_updates()
andaccept_matrix_updates()
compute_parameter_update_loss()
aggressive_momentum_model_free.py
Testing
Utility functions
Test the computations performed in the kernel.
increase_momentum_beta
test_increase_momentum_beta_regular
: increase momentum beta if upper bound is not reachedtest_increase_momentum_beta_upper_bound_reached
: increase momentum beta if upper bound is reachedincrease_momentum_upper_bound_beta_bar
test_increase_momentum_upper_bound_beta_bar_regular
: increase upper bound for beta ifbeta_bar
is below1.0
test_increase_momentum_upper_bound_beta_bar_bound_reached
:beta_bar
is already at its upper bound1.0
so do not increase further and return1.0
decrease_momentum_beta
validate_hyperparameters
test_validate_hyperparameters_valid_params
: parameters valid, so no error should be raisedtest_validate_hyperparameters_invalid_momentum_beta_too_low
:momentum_beta
too low (-0.1
), should raise an errortest_validate_hyperparameters_invalid_momentum_beta_too_high
:momentum_beta
too high (1.5
), should raise an errortest_validate_hyperparameters_invalid_gamma_bar_too_low
:momentum_upper_bound_increase_factor_gamma_bar
too low (1.0
), should raise an errortest_validate_hyperparameters_invalid_gamma_bar_equal_to_gamma
:gamma_bar
equal toogamma
(1.0
), should raise an errortest_validate_hyperparameters_invalid_gamma_bar_higher_than_gamma
:gamma_bar
(1.5
) higher thangamma
(1.2
), should raise an errortest_validate_hyperparameters_invalid_gamma_equal_to_eta
:gamma
equal toeta
, should raise an errortest_validate_hyperparameters_invalid_gamma_higher_to_eta
:gamma
higher thaneta
, should raise an errortest_reconstruct_X_from_L
tests the ReLU function for the cases:L
negative: return matrix of zerosL
positive: returnL
L
zero: returnL
L
: set all negative values to zeroKernel class utility methods
test_aggressive_momentum_kernel_instantiation
: tests correct initialization of the kerneltest_aggressive_momentum_first_kernel_step
: tests the logic of the first kernel steptest_compute_parameter_update_loss
: tests correct assignment of parameter update loss (computation is tested in utility function tests)test_aggressive_momentum_increase_momentum_parameters
: tests correct assignment (computations are tested in utility function tests)test_aggressive_momentum_decrease_momentum_parameters
: tests correct assignment (computations are tested in utility function tests)test_aggressive_momentum_accept_matrix_updates
: tests correct assignmenttest_aggressive_momentum_reject_matrix_updates
: tests correct assignmenttest_aggressive_momentum_parameter_adaption
test_aggressive_momentum_running_report
: same test as in the base model-free casetest_aggressive_momentum_final_report
: same test as in the base model-free caseIntegration Test
Added A-NMD to
all_kernels_with_params
with the default parameters.Checklist
Closing thoughts
We should further discuss how we handle the loss computation mentioned earlier: we could continue using $|| Z - L ||_F^2$ as a loss function for algorithms that do not extrapolate
Z
andL
(and document it appropriately) or we change the loss for all algorithms to $|| X -max(0.0, L) ||_F^2$.