Refactor: remove numeric and reimplement necessary matrix operations

idiotWu commented 3 years ago

Resolves #211.

As is discussed in #225, the performance of math.js is incredibly low so we need to implement all the required matrix operations ourselves. This PR removed numeric completely and added matrix operations such as addition, subtraction, inversion, identity matrix generation, etc., into mat.mjs. Most of the implements are based on WEKA.

CC @Skylion007

Skylion007 commented 3 years ago

Well this is nice as it removes us from Numeric, I would like to see if we can use an officially supported Matrix Library so that we benefit from all their Micro-optimizations (see issue https://github.com/brownhci/WebGazer/issues/31). This is still better than what we have currently, but after investigating the issue this morning I am confident we can get even better performance (by potentially and order of magnitude) by using a library that doesn't rely on native JS associate arrays (as they are really slow).

idiotWu commented 3 years ago

@Skylion007 I compared our implementation with other linalg libs by inverting a 4*4 matrix 1M times:

~~As you can see, current implementation is actually fast enough. It can be even faster if we remove the matrix shape check in mat.solve() function~~ There is a typo in A[0].length...shape check doesn't seem to affect performace. (see below)

https://github.com/brownhci/WebGazer/blob/d95cf6361945c6624da77e39a4009b2ade1d2c2a/src/mat.mjs#L232-L239

Results

math.js	numeric	ml-matrix	eigen	ours	ours w/o shape check
2.695s	0.834s	3.467s	3.804s	1.175s	0.670s

test code

```js import * as mathjs from 'mathjs'; import numeric from 'numeric'; import eigen from 'eigen'; import { inverse } from 'ml-matrix'; import mat from './src/mat.mjs'; (async () => { await eigen.ready; const m = [[1, 0, 0, 0], [0, 2, 0, 0], [0, 0, 3, 0], [0, 0, 0, 4]]; const m_eig = new eigen.Matrix(m); function run(label, fn) { console.time(label); for (let i = 0; i < 1e6; i++) { fn(); } console.timeEnd(label); } run('mathjs.inv()', () => mathjs.inv(m)); run('numeric.inv()', () => numeric.inv(m)) run('ml-matrix.inverse()', () => inverse(m)) run('eigen.inverse()', () => m_eig.inverse()) run('our implementation', () => mat.inv(m)) })(); ```

Skylion007 commented 3 years ago

I compared our implementation with other linalg libs by inverting a 4*4 matrix 1M times:

Are all your tests on 4x4 matrices? The matrices we are multiplying are significantly larger for the decomposition steps (which are the bottleneck). Often these library won't scale well to small matrices (to the point where they are specialized linalg libraries just for 4x4 matrices like https://glmatrix.net/). There is an overhead with every WASM matmul call the key is to see if it would matter for the matrix sizes that we care about (ie. the ones used in the lin-alg decompositions).

Skylion007 commented 3 years ago

For a more realistic matrix size, it seems to do better:

idiotWu commented 3 years ago

I believe ours is fast enough even with 12x12 matrices:

Results

Matrix size: 12x12, iteration count: 100,000

	math.js	numeric	ml-matrix	eigen	ours
Multiplication	3.914s	N/A	5.022s	11.828s	0.522s
Inversion	11.545s	0.646s	5.192s	7.552s	1.074s

Test Code

```js import * as mathjs from 'mathjs'; import numeric from 'numeric'; import eigen from 'eigen'; import { inverse, Matrix } from 'ml-matrix'; import mat from './src/mat.mjs'; (async () => { await eigen.ready; const size = 12; const count = 1e5; // eigen fails with 1e6 runs const m = mathjs.random([size, size]); const m2 = mathjs.random([size, size]); function run(label, fn) { console.time(label); for (let i = 0; i < count; i++) { fn(); } console.timeEnd(label); } console.log(`Matrix size: ${size}x${size}, iteration count: ${count}\n`); console.log('Matrix Multiplication:'); run('mathjs', () => mathjs.multiply(m, m2)); // numeric doesn't support matmul run('ml-matrix', () => (new Matrix(m).mmul(m2))); run('eigen', () => (new eigen.Matrix(m)).matMul(new eigen.Matrix(m2))); run('ours', () => mat.mult(m, m2)); console.log(); console.log('Matrix Inversion:'); run('mathjs', () => mathjs.inv(m)); run('numeric', () => numeric.inv(m)); run('ml-matrix', () => inverse(m)); run('eigen', () => (new eigen.Matrix(m)).inverse()); run('ours', () => mat.inv(m)); })(); ```

idiotWu commented 3 years ago

If you move the createMatrix() statements to the inside of for block, i.e.

for (let k = 0; k < iterations; k++) {
  const A = createMatrix('eig', size);
  const B = createMatrix('eig', size);
  A.matMul(B);
}

you may find those libraries spend too much time constructing matrix instances.

Skylion007 commented 3 years ago

const A = createMatrix('eig', size);
  const B = createMatrix('eig', size);

That seems like the wrong thing to benchmark. Of course JS array creation is faster in native arrays. Anything that converts between them needs to iterate on the array, while the other one can just create it directly from the interperter. But the only conversion we really need is after the output of resize. The key is that we need to replace all native associate arrays with these Matrix types to get any performance speed up, not switch between them (like all the variables in the Kalman Filter.) I also suspect typedarrays would be faster to convert (assuming Eigen.JS or something similar is smart enough to use a buffer protocol).

idiotWu commented 3 years ago

But the only conversion we really need is after the output of resize. The key is that we need to replace all native associate arrays with these Matrix types to get any performance speed up, not switch between them (like all the variables in the Kalman Filter.)

Even if all the variables are stores as Matrices using Eigen.js, current implementation is still faster in matrix multiplication:

Test Code

```js import * as mathjs from 'mathjs'; import eigen from 'eigen'; import mat from './src/mat.mjs'; (async () => { await eigen.ready; const size = 12; const count = 1e6; const m = mathjs.random([size, size]); const m2 = mathjs.random([size, size]); const egi_m = new eigen.Matrix(m); const egi_m2 = new eigen.Matrix(m2); function run(label, fn) { console.time(label); for (let i = 0; i < count; i++) { fn(); } console.timeEnd(label); } console.log(`Matrix size: ${size}x${size}, iteration count: ${count}\n`); console.log('Matrix Multiplication:'); run('eigen', () => egi_m.matMul(egi_m2)); run('ours', () => mat.mult(m, m2)); })(); ```

Anyway, I think javascript-based matrix operations are fast enough for now, and it would be easier to integrate with hardware acceleration libraries like gpu.js in the future.

idiotWu commented 3 years ago

Another drawback for Eigen.js is that it fails when performing intensive calculations, e.g. multiply two 100*100 matrices for 100k times. Maybe it's related to memory leak?

idiotWu commented 3 years ago

I found many unnecessary codes that make copies of input matrix in mat.mjs, for example

https://github.com/brownhci/WebGazer/blob/7ff29a32b12048362750d0594ecf8375dcdd22a0/src/mat.mjs#L85-L87

the above code is redundant and the mat.mult() function can be optimized as:

function mult2(A, B) {
    // matrices shape
    const rowsA = A.length, colsA = A[0].length;
    const rowsB = B.length, colsB = B[0].length;

    if (colsA !== rowsB) {
        throw new Error('Matrix inner dimensions must agree.');
    }

    const X = [];

    for (let i = 0; i < rowsA; i++) {
      X[i] = [];

      for (let j = 0; j < colsB; j++) {
            let s = 0;

            for (let k = 0; k < colsA; k++) {
                s += A[i][k] * B[k][j];
            }

            X[i][j] = s;
        }
    }

    return X;
}

Maybe we can first purge these unnecessary code, and then figure out how to integrate with gpu.js?

Edit: wait...why was the optimized mult2() even slower than the original mult()???

Skylion007 commented 3 years ago

Another drawback for Eigen.js is that it fails when performing intensive calculations, e.g. multiply two 100*100 matrices for 100k times. Maybe it's related to memory leak?

Found the problem I think. Eigen.js needs it's garbage collector run manually (which is a tad annoying).

Skylion007 commented 3 years ago

I found many unnecessary codes that make copies of input matrix in mat.mjs, for example

https://github.com/brownhci/WebGazer/blob/7ff29a32b12048362750d0594ecf8375dcdd22a0/src/mat.mjs#L85-L87

the above code is redundant and the mat.mult() function can be optimized as:
function mult2(A, B) {
    // matrices shape
    const rowsA = A.length, colsA = A[0].length;
    const rowsB = B.length, colsB = B[0].length;

    if (colsA !== rowsB) {
        throw new Error('Matrix inner dimensions must agree.');
    }

    const X = [];

    for (let i = 0; i < rowsA; i++) {
      X[i] = [];

      for (let j = 0; j < colsB; j++) {
            let s = 0;

            for (let k = 0; k < colsA; k++) {
                s += A[i][k] * B[k][j];
            }

            X[i][j] = s;
        }
    }

    return X;
}
Maybe we can first purge these unnecessary code, and then figure out how to integrate with gpu.js?

Edit: wait...why was the optimized mult2() even slower than the original mult()???

The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.

idiotWu commented 3 years ago

The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.

I tried caching A[i] and preallocating the array size, but it didn't make any difference.

Skylion007 commented 3 years ago

@idiotWu Can we get ABTests on all refactored/reimplemented functions?

Skylion007 commented 3 years ago

The optimized code is even slower because of the array access. You aren't caching A[i] and aren't preallocating the array size either. These subtle changes are exactly why I'd much rather relying on a JS/WASM library. It's very easy to get things wrong.

I tried caching A[i] and preallocating the array size, but it didn't make any difference.

That redundant call seem to cache the column vector in BColJ for faster access later.

idiotWu commented 3 years ago

@idiotWu Can we get ABTests on all refactored/reimplemented functions?

Like Math.random() >= 0.5 ? numeric : our_fn? I just don't understand why we need A/B tests for this...

idiotWu commented 3 years ago

That redundant call seem to cache the column vector in BColJ for faster access later.

Yeah, accessing 2d array elements by columns is slower than accessing by rows in C++, maybe the same in JavaScript? From this point of view, switching to TypedArray may improve the performance as data are stored as contiguous blocks of memory?

BTW, our current LU/QR decomposition implements seem to be difficult to integrate with gpu.js, as we are mutating variables inside loops. Is there any good example code performing LU decomposition on GPU?

Skylion007 commented 3 years ago

Any follow ups on the PR? Also do you know why the transpose is slower? @idiotWu TF.js can do Matrix decomposition on a GPU

idiotWu commented 3 years ago

Also do you know why the transpose is slower?

Did you mean the inversion (since we haven't compared the transposition yet)? If so, I think that is because the inv() method adopted from WEKA code is using decompositions to inverse matrices: https://github.com/brownhci/WebGazer/blob/37d527f8674f46c3d73239d5ce40064260657479/src/worker_scripts/mat.js#L205-L207 https://github.com/brownhci/WebGazer/blob/37d527f8674f46c3d73239d5ce40064260657479/src/worker_scripts/mat.js#L235-L242

TF.js can do Matrix decomposition on a GPU

Then maybe we can use tf.js to do all the matrix operations? I'll have a look when I get time (being super busy right now 😖).

jeffhuang commented 1 year ago

been testing this. hopefully will merge today. thanks for your patience

jeffhuang commented 1 year ago

@idiotWu your numeric to mat changes have finally been merged in ae20073! sorry it took 2 years. I really really appreciate all that you did

brownhci / WebGazer

Refactor: remove numeric and reimplement necessary matrix operations #226

Results

Results