flatironinstitute / nomad

Non-linear Matrix Decomposition library
Apache License 2.0
11 stars 1 forks source link

18-Add model-free kernel Aggressive Momentum NMD #22

Closed sfohr closed 3 months ago

sfohr commented 4 months ago

Add Aggressive Momentum NMD from Seraghiti et.al. (2023)

Closes #18.

Type of change

Motivation and Context

The change adds the Aggressive Momentum model-free kernel described in Seraghiti et. al. 2023 (their matlab implementation). It's an extension of the base model-free algorithm from Saul (2022) and extends the base algorithm in two ways:

  1. Extrapolation of utility matrix Z and low-rank candidate L using a momentum term with parameter momentum_beta to accelerate convergence.
  2. The momentum parameter is heuristically tuned conditional on the increase or decrease of the loss.

Description

This PR introduces the following changes:

Testing

Utility functions

Test the computations performed in the kernel.

Kernel class utility methods

Integration Test

Added A-NMD to all_kernels_with_params with the default parameters.

Checklist

Closing thoughts

We should further discuss how we handle the loss computation mentioned earlier: we could continue using $|| Z - L ||_F^2$ as a loss function for algorithms that do not extrapolate Z and L (and document it appropriately) or we change the loss for all algorithms to $|| X -max(0.0, L) ||_F^2$.

sfohr commented 4 months ago

I guess we need to specify "numpy ^1.24" in pyproject.toml due to major version release.

jsoules commented 4 months ago

I guess we need to specify "numpy ^1.24" in pyproject.toml due to major version release.

Ah, yes! Glad you noticed this.

I pushed a small PR that should re-enable support for numpy versions up to 2.0. Feel free to merge that in and/or rebase (whatever your preference is).

I've refreshed my memory on the issues here and will be reviewing this PR today and tomorrow. Thanks for the extensive documentation, both in the PR and the original issue--it really helps to bring me back up to speed.

jsoules commented 4 months ago

We should further discuss how we handle the loss computation mentioned earlier: we could continue using ||Z−L||F2 as a loss function for algorithms that do not extrapolate Z and L (and document it appropriately) or we change the loss for all algorithms to ||X−max(0.0,L)||F2.

This is a great point which we should discuss further. Let me make sure I understand the issue. Because of the momentum terms, the low-rank candidate L at this point may have positive values in spots that are supposed to be 0--so the norm of the difference may be misleading. Is that correct?

I suspect that applying the max(0.0, L) is pretty cheap for any matrix of a size we can realistically handle, so I would be open to just converting the loss function. However, we'll want to profile it with the different algorithms to see how much of an impact it has. I am willing to accept a small performance hit for better rigor and consistency in the code, but let's try to quantify it before making a final decision.

jsoules commented 4 months ago

Thanks again @sfohr for your work on this issue (& package).

I think this is ready to merge on my end, so if you are satisfied with it, let me know and I'll approve the PR.

jsoules commented 3 months ago

Apologies for the delay in merging--I missed your response!

Nice work!