flatironinstitute / nomad

Non-linear Matrix Decomposition library
Apache License 2.0
9 stars 1 forks source link

codecov

Final library name TK

This library implements methods to facilitate minimally lossy matrix decomposition for sparse nonnegative matrices, under the paradigm described in Saul (2022).

In the problem setting, given a sparse nonnegative matrix X, we would like to return a low-rank matrix L of known target rank r. Applying a ReLU nonlinearity to L allows lossless recovery of X.

Four methods of estimating the low-rank representation L are currently offered:

All methods operate in an iterative fashion; the model-based methods are particularly analogous to expectation-maximization, in that they iteratively refine a model's parameters and recompute the posterior probability under the new parameters.

Getting Started

The library is accessed through Python calls (interactive REPL and scripts have been tested; Jupyter has not, but should work).

Only numpy is used extensively, but the other libraries offer more convenient implementations of some of the statistical operations.

As the fi_nomad package is not yet published, you'll need to install it in local mode. The easiest way to do this is to:

We'll publish to pypi once we're out of alpha and have picked a good name.

Example

Load the observed non-negative matrix X as a numpy array:

from fi_nomad import decompose
from fi_nomad.types import KernelStrategy
import numpy as np
import logging

# "info"-level log messages in the library won't be displayed unless the
# caller explicitly allows them--this is by design, since it's bad to let
# library code override its caller's logging strategy!
logging.basicConfig(level=logging.INFO)

# NOTE: Ensure that this uses a float dtype!
# If you try to use this method on an integer array, it
# isn't going to work very well!
nonnegative_matrix_X = np.array([...]) # or load from file, etc.

target_rank = 5         # as per your domain expertise

result_data = decompose(
    nonnegative_matrix_X,
    target_rank,
    kernel_strategy=KernelStrategy.GAUSSIAN_MODEL_SINGLE_VARIANCE,
    verbose=True
)
model_means_L = result_data.factors[0] @ result_data.factors[1]
model_variance = result_data.variance

# use model_means_L as appropriate for your application

# To visualize recovery of X, assuming target_rank was high enough to do so:
#   First improve readability of printed numpy arrays:
np.set_printoptions(precision=5, linewidth=150)

#   then pass the low-rank estimate through a ReLU nonlinearity and compare
#   its result to the input sparse matrix:
relu_L = np.copy(model_means_L)
relu_L[relu_L < 0] = 0
print(relu_L)

Additional Features

The main entry point for the model-based low-rank matrix estimation is fi_nomad.decompose. Three parameters are required:

Additionally, the following options are exposed:

Regardless of settings, the estimator will generate a warning if the iteration-over-iteration likelihood was observed to decrease during the course of estimation. (This is quite common due to numerics noise once a good estimate has been reached.)

References

Lawrence K Saul (2022), "A Nonlinear Matrix Decomposition for Mining the Zeros of Sparse Data" https://doi.org/10.1137/21M1405769 (Preprint: https://cseweb.ucsd.edu/~saul/papers/preprints/simods22_preprint.pdf)

Seraghiti, G., et. al. (2023), "Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU Function" https://arxiv.org/abs/2305.08687