SmartTensors / NMFk.jl

Nonnegative Matrix Factorization + k-means clustering and physics constraints for Unsupervised and Physics-Informed Machine Learning
GNU General Public License v3.0
13 stars 1 forks source link
blind-source-separation feature-extraction julia machine-learning physics-informed-learning scientific-computing scientific-machine-learning source-identification unsupervised-machine-learning

NMFk: Nonnegative Matrix Factorization + k-means clustering and physics constraints


NMFk is a module of the SmartTensors ML framework (


NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.

Classical NMF approaches do not allow for automatic estimation of the number of features.

NMFk estimates the number of features k through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).

SmartTensors can be applied to perform:

NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.

NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.

NMFk methodology and applications are discussed in the research papers and presentations listed below.

NMFk is demonstrated with a series of examples and test problems provided here.


SmartTensors and NMFk were recently awarded:



After starting Julia, execute:

import Pkg

to access the latest released version.

To utilize the latest code updates (commits), use:

import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))


docker run --interactive --tty montyvesselinov/tensors

The docker image provides access to all SmartTensors packages (


import Pkg


A simple problem demonstrating NMFk can be executed as follows. First, generate 3 random signals in a matrix W:

a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]

Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:

X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]

This is equivalent to generating a mixing matrix H and obtaining X by multiplying W and H

H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H

After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X.

import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);

The execution will produce output like this:

[ Info: Results
Signals:  2 Fit:       15.489 Silhouette:    0.9980145 AIC:    -38.30184
Signals:  3 Fit: 3.452203e-07 Silhouette:    0.8540085 AIC:    -1319.743
Signals:  4 Fit: 8.503988e-07 Silhouette:   -0.5775127 AIC:    -1212.129
Signals:  5 Fit: 2.598571e-05 Silhouette:   -0.6757581 AIC:    -915.6589
[ Info: Optimal solution: 3 signals

The code returns the estimated optimal number of signals kopt, which in this case, as expected, is equal to 3.

The code returns the fitquality and robustness; they can applied to represent how the solutions change with the increase of k:

NMFk.plot_signal_selecton(2:5, fitquality, robustness)

The code also returns estimates of matrices W and H.

It can be easily verified that estimated We[kopt] and He[kopt] are scaled versions of the original W and H matrices.

Note that the order of columns ('signals') in W and We[kopt] are not expected to match. The order of rows ('sensors') in H and He[kopt] are also not expected to match. The estimated orders will be different every time the code is executed.

The matrices can be visualized using:

import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))

More examples can be found in the test, demo, examples, and notebooks directories of the NMFk repository.


NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:



More videos are available at YouTube


A series of Jupyter notebooks demonstrating NMFk have been developed:

The notebooks can also be accessed using:


Other Examples:


Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1


Research papers are also available at Google Scholar, ResearchGate and


Presentations are also available at, ResearchGate and

Extra information

For more information, visit, [],(, and