Closed idiotWu closed 1 year ago
Well this is nice as it removes us from Numeric, I would like to see if we can use an officially supported Matrix Library so that we benefit from all their Micro-optimizations (see issue https://github.com/brownhci/WebGazer/issues/31). This is still better than what we have currently, but after investigating the issue this morning I am confident we can get even better performance (by potentially and order of magnitude) by using a library that doesn't rely on native JS associate arrays (as they are really slow).
@Skylion007 I compared our implementation with other linalg libs by inverting a 4*4 matrix 1M times:
As you can see, current implementation is actually fast enough. It can be even faster if we remove the matrix shape check in There is a typo in mat.solve()
function A[0].length
...shape check doesn't seem to affect performace. (see below)
math.js | numeric | ml-matrix | eigen | ours | ours w/o shape check |
---|---|---|---|---|---|
2.695s | 0.834s | 3.467s | 3.804s | 1.175s | 0.670s |
I compared our implementation with other linalg libs by inverting a 4*4 matrix 1M times:
Are all your tests on 4x4 matrices? The matrices we are multiplying are significantly larger for the decomposition steps (which are the bottleneck). Often these library won't scale well to small matrices (to the point where they are specialized linalg libraries just for 4x4 matrices like https://glmatrix.net/). There is an overhead with every WASM matmul call the key is to see if it would matter for the matrix sizes that we care about (ie. the ones used in the lin-alg decompositions).
For a more realistic matrix size, it seems to do better:
I believe ours is fast enough even with 12x12 matrices:
Matrix size: 12x12, iteration count: 100,000
math.js | numeric | ml-matrix | eigen | ours | |
---|---|---|---|---|---|
Multiplication | 3.914s | N/A | 5.022s | 11.828s | 0.522s |
Inversion | 11.545s | 0.646s | 5.192s | 7.552s | 1.074s |
If you move the createMatrix()
statements to the inside of for
block, i.e.
for (let k = 0; k < iterations; k++) {
const A = createMatrix('eig', size);
const B = createMatrix('eig', size);
A.matMul(B);
}
you may find those libraries spend too much time constructing matrix instances.
const A = createMatrix('eig', size); const B = createMatrix('eig', size);
That seems like the wrong thing to benchmark. Of course JS array creation is faster in native arrays. Anything that converts between them needs to iterate on the array, while the other one can just create it directly from the interperter. But the only conversion we really need is after the output of resize. The key is that we need to replace all native associate arrays with these Matrix types to get any performance speed up, not switch between them (like all the variables in the Kalman Filter.) I also suspect typedarrays would be faster to convert (assuming Eigen.JS or something similar is smart enough to use a buffer protocol).
But the only conversion we really need is after the output of resize. The key is that we need to replace all native associate arrays with these Matrix types to get any performance speed up, not switch between them (like all the variables in the Kalman Filter.)
Even if all the variables are stores as Matrices using Eigen.js, current implementation is still faster in matrix multiplication:
Anyway, I think javascript-based matrix operations are fast enough for now, and it would be easier to integrate with hardware acceleration libraries like gpu.js in the future.
Another drawback for Eigen.js is that it fails when performing intensive calculations, e.g. multiply two 100*100 matrices for 100k times. Maybe it's related to memory leak?
I found many unnecessary codes that make copies of input matrix in mat.mjs
, for example
the above code is redundant and the mat.mult()
function can be optimized as:
function mult2(A, B) {
// matrices shape
const rowsA = A.length, colsA = A[0].length;
const rowsB = B.length, colsB = B[0].length;
if (colsA !== rowsB) {
throw new Error('Matrix inner dimensions must agree.');
}
const X = [];
for (let i = 0; i < rowsA; i++) {
X[i] = [];
for (let j = 0; j < colsB; j++) {
let s = 0;
for (let k = 0; k < colsA; k++) {
s += A[i][k] * B[k][j];
}
X[i][j] = s;
}
}
return X;
}
Maybe we can first purge these unnecessary code, and then figure out how to integrate with gpu.js?
Edit: wait...why was the optimized mult2()
even slower than the original mult()
???
Another drawback for Eigen.js is that it fails when performing intensive calculations, e.g. multiply two 100*100 matrices for 100k times. Maybe it's related to memory leak?
Found the problem I think. Eigen.js needs it's garbage collector run manually (which is a tad annoying).
I found many unnecessary codes that make copies of input matrix in
mat.mjs
, for examplethe above code is redundant and the
mat.mult()
function can be optimized as:function mult2(A, B) { // matrices shape const rowsA = A.length, colsA = A[0].length; const rowsB = B.length, colsB = B[0].length; if (colsA !== rowsB) { throw new Error('Matrix inner dimensions must agree.'); } const X = []; for (let i = 0; i < rowsA; i++) { X[i] = []; for (let j = 0; j < colsB; j++) { let s = 0; for (let k = 0; k < colsA; k++) { s += A[i][k] * B[k][j]; } X[i][j] = s; } } return X; }
Maybe we can first purge these unnecessary code, and then figure out how to integrate with gpu.js?
Edit: wait...why was the optimized
mult2()
even slower than the originalmult()
???
The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.
The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.
I tried caching A[i] and preallocating the array size, but it didn't make any difference.
@idiotWu Can we get ABTests on all refactored/reimplemented functions?
The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.
I tried caching A[i] and preallocating the array size, but it didn't make any difference.
That redundant call seem to cache the column vector in BColJ for faster access later.
@idiotWu Can we get ABTests on all refactored/reimplemented functions?
Like Math.random() >= 0.5 ? numeric : our_fn
? I just don't understand why we need A/B tests for this...
That redundant call seem to cache the column vector in BColJ for faster access later.
Yeah, accessing 2d array elements by columns is slower than accessing by rows in C++, maybe the same in JavaScript? From this point of view, switching to TypedArray may improve the performance as data are stored as contiguous blocks of memory?
BTW, our current LU/QR decomposition implements seem to be difficult to integrate with gpu.js
, as we are mutating variables inside loops. Is there any good example code performing LU decomposition on GPU?
Any follow ups on the PR? Also do you know why the transpose is slower? @idiotWu TF.js can do Matrix decomposition on a GPU
Also do you know why the transpose is slower?
Did you mean the inversion (since we haven't compared the transposition yet)? If so, I think that is because the inv()
method adopted from WEKA code is using decompositions to inverse matrices: https://github.com/brownhci/WebGazer/blob/37d527f8674f46c3d73239d5ce40064260657479/src/worker_scripts/mat.js#L205-L207 https://github.com/brownhci/WebGazer/blob/37d527f8674f46c3d73239d5ce40064260657479/src/worker_scripts/mat.js#L235-L242
TF.js can do Matrix decomposition on a GPU
Then maybe we can use tf.js to do all the matrix operations? I'll have a look when I get time (being super busy right now 😖).
been testing this. hopefully will merge today. thanks for your patience
@idiotWu your numeric to mat changes have finally been merged in ae20073! sorry it took 2 years. I really really appreciate all that you did
Resolves #211.
As is discussed in #225, the performance of
math.js
is incredibly low so we need to implement all the required matrix operations ourselves. This PR removednumeric
completely and added matrix operations such as addition, subtraction, inversion, identity matrix generation, etc., intomat.mjs
. Most of the implements are based on WEKA.CC @Skylion007