NMFk is a module of the SmartTensors ML framework (smarttensors.com).
NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.
Classical NMF approaches do not allow for automatic estimation of the number of features.
NMFk estimates the number of features k
through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).
SmartTensors can be applied to perform:
NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.
NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.
NMFk methodology and applications are discussed in the research papers and presentations listed below.
NMFk is demonstrated with a series of examples and test problems provided here.
SmartTensors and NMFk were recently awarded:
After starting Julia, execute:
import Pkg
Pkg.add("NMFk")
to access the latest released version.
To utilize the latest code updates (commits), use:
import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))
docker run --interactive --tty montyvesselinov/tensors
The docker image provides access to all SmartTensors packages (smarttensors.github.io).
import Pkg
Pkg.test("NMFk")
A simple problem demonstrating NMFk can be executed as follows.
First, generate 3 random signals in a matrix W
:
a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]
Then, mix the signals to produce a data matrix X
of 5 sensors observing the mixed signals as follows:
X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]
This is equivalent to generating a mixing matrix H
and obtaining X
by multiplying W
and H
H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H
After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X
.
import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);
The execution will produce output like this:
[ Info: Results
Signals: 2 Fit: 15.489 Silhouette: 0.9980145 AIC: -38.30184
Signals: 3 Fit: 3.452203e-07 Silhouette: 0.8540085 AIC: -1319.743
Signals: 4 Fit: 8.503988e-07 Silhouette: -0.5775127 AIC: -1212.129
Signals: 5 Fit: 2.598571e-05 Silhouette: -0.6757581 AIC: -915.6589
[ Info: Optimal solution: 3 signals
The code returns the estimated optimal number of signals kopt
, which in this case, as expected, is equal to 3.
The code returns the fitquality
and robustness
; they can applied to represent how the solutions change with the increase of k
:
NMFk.plot_signal_selecton(2:5, fitquality, robustness)
The code also returns estimates of matrices W
and H
.
It can be easily verified that estimated We[kopt]
and He[kopt]
are scaled versions of the original W
and H
matrices.
Note that the order of columns ('signals') in W
and We[kopt]
are not expected to match.
The order of rows ('sensors') in H
and He[kopt]
are also not expected to match.
The estimated orders will be different every time the code is executed.
The matrices can be visualized using:
import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))
More examples can be found in the test
, demo
, examples
, and notebooks
directories of the NMFk repository.
NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:
More videos are available at YouTube
A series of Jupyter notebooks demonstrating NMFk have been developed:
The notebooks can also be accessed using:
NMFk.notebooks()
Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1
Research papers are also available at Google Scholar, ResearchGate and Academia.edu
Presentations are also available at slideshare.net, ResearchGate and Academia.edu
For more information, visit monty.gitlab.io, http://smarttensors.com [smarttensors.github.io],(https://smarttensors.github.io), and tensors.lanl.gov.