SmartTensors / NMFk.jl

Nonnegative Matrix Factorization + k-means clustering and physics constraints for Unsupervised and Physics-Informed Machine Learning
https://smarttensors.github.io
GNU General Public License v3.0
13 stars 1 forks source link
blind-source-separation feature-extraction julia machine-learning physics-informed-learning scientific-computing scientific-machine-learning source-identification unsupervised-machine-learning

NMFk: Nonnegative Matrix Factorization + k-means clustering and physics constraints

nmfk

NMFk is a module of the SmartTensors ML framework (smarttensors.com).

SmartTensors

NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.

Classical NMF approaches do not allow for automatic estimation of the number of features.

NMFk estimates the number of features k through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).

SmartTensors can be applied to perform:

NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.

NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.

NMFk methodology and applications are discussed in the research papers and presentations listed below.

NMFk is demonstrated with a series of examples and test problems provided here.

Awards

SmartTensors and NMFk were recently awarded:

R&D100

Installation

After starting Julia, execute:

import Pkg
Pkg.add("NMFk")

to access the latest released version.

To utilize the latest code updates (commits), use:

import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))

Docker

docker run --interactive --tty montyvesselinov/tensors

The docker image provides access to all SmartTensors packages (smarttensors.github.io).

Testing

import Pkg
Pkg.test("NMFk")

Examples

A simple problem demonstrating NMFk can be executed as follows. First, generate 3 random signals in a matrix W:

a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]

Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:

X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]

This is equivalent to generating a mixing matrix H and obtaining X by multiplying W and H

H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H

After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X.

import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);

The execution will produce output like this:

[ Info: Results
Signals:  2 Fit:       15.489 Silhouette:    0.9980145 AIC:    -38.30184
Signals:  3 Fit: 3.452203e-07 Silhouette:    0.8540085 AIC:    -1319.743
Signals:  4 Fit: 8.503988e-07 Silhouette:   -0.5775127 AIC:    -1212.129
Signals:  5 Fit: 2.598571e-05 Silhouette:   -0.6757581 AIC:    -915.6589
[ Info: Optimal solution: 3 signals

The code returns the estimated optimal number of signals kopt, which in this case, as expected, is equal to 3.

The code returns the fitquality and robustness; they can applied to represent how the solutions change with the increase of k:

NMFk.plot_signal_selecton(2:5, fitquality, robustness)
signal_selection

The code also returns estimates of matrices W and H.

It can be easily verified that estimated We[kopt] and He[kopt] are scaled versions of the original W and H matrices.

Note that the order of columns ('signals') in W and We[kopt] are not expected to match. The order of rows ('sensors') in H and He[kopt] are also not expected to match. The estimated orders will be different every time the code is executed.

The matrices can be visualized using:

import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
signals_original
signals_reconstructed
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))
signals_original
signals_reconstructed

More examples can be found in the test, demo, examples, and notebooks directories of the NMFk repository.

Applications:

NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:

Videos:

nmfk-example

More videos are available at YouTube

Notebooks:

A series of Jupyter notebooks demonstrating NMFk have been developed:

The notebooks can also be accessed using:

NMFk.notebooks()

Other Examples:

Patent:

Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1

Publications:

Research papers are also available at Google Scholar, ResearchGate and Academia.edu

Presentations:

Presentations are also available at slideshare.net, ResearchGate and Academia.edu

Extra information

For more information, visit monty.gitlab.io, http://smarttensors.com [smarttensors.github.io],(https://smarttensors.github.io), and tensors.lanl.gov.