Array parsing time grows non-linearly with array size

yurivict commented 4 years ago

This program

var N = 18;

var s = "0";
for (var i = 0; i < N; i++)
        s = s+","+s;
print("s.len="+s.length);

eval("var arr1 = ["+s+"];");
print("arr.len="+arr1.length);

finishes in 23sec with N=18, in 115s with N=19 and in >500s with N=20, despite the array size only increasing 2 times with each increment of N.

It seems like it should be reasonable to expect the time to grow linearly because adding elements to the array's tail while scanning the string is all that is required here. There should be no need to perform any operations depending on the array length in each step,

sgbeal commented 4 years ago

Are you sure that the time isn't being taken up in the string concatenation loop? That has to create tons of temporary strings.

yurivict commented 4 years ago

Are you sure that the time isn't being taken up in the string concatenation loop?

Yes, it prints the string size immediately.

sgbeal commented 4 years ago

Ah, i didn't notice that you are doubling the size of the array on each loop iteration - i thought you were just adding one entry per iteration. (Now that 115s makes more sense!)

FWIW, i seem to remember reading somewhere that Google v8's array class does not support more than something like 100k entries (it might have even been 30k - it was nearly 10 years ago in the v8 Google Group). Creating an array with a quarter of a million(?) entries (N=18) seems somewhat excessive/unrealistic.

yurivict commented 4 years ago

It is realistic, some computations produce this much data for just one reasonably sized image.

sgbeal commented 4 years ago

i dare say that JS is not an appropriate language for such calculations. But to each his own.

yurivict commented 4 years ago

i dare say that JS is not an appropriate language for such calculations. But to each his own.

There are corner cases, you can't switch to C++ just because you hit one huge array once a month. Besides, it's obviously a bug in MuJS, so it shouldn't even be a limitation in this case.

yurivict commented 4 years ago

This also means that similar operations with smaller arrays are also slower than they should be, it's just not as visible because they are smaller.

sgbeal commented 4 years ago

To be pedantic: this is a performance limitation, not a bug - the software performs the expected operation with the expected result, it just does so more slowly than you would like. That's not to say it can't be improved, but characterizing it as a bug does its developer a disservice.

yurivict commented 4 years ago

You can call it anything, but when p in Time =~ N^p is higher than it is normally expected (1 vs 2 in this case) some jobs would practically just not finish at all when they could finish in a reasonable time otherwise.

ccxvii commented 3 years ago

I'm afraid that parsing a 2Mb script consisting of a single array literal is beyond the scope of mujs's normal use case. Even C compilers struggle with that kind of code. With the current (16-bit bytecode) instruction scheme, even at N=18 the 512Kb eval script runs out of instruction space. It would possibly work much faster and have fewer problems if you rewrote the script to create the array directly instead of creating a huge eval script with the array literal.

avih commented 3 years ago

rewrote the script to create the array directly instead of creating a huge eval script

My guess is that it's a test case to demonstrate the issue, while his real issue is with external code generating big arrays of data - which he can't generate from his js code.

At such case though, maybe the solution is to read the data (stdin or a file) from the script and parse/process it to add values to a js array. Not sure how much the mujs binary is equipped for such task though, so you might need edit it or write your own libmujs wrapper which allows it (e.g. expose the C fgets or some such to the js script).

yurivict commented 3 years ago

The main problem is non-linearity. The data grows linearly but the parser probably uses some double loop somewhere.

The point isn't a huge array with 2MB of data. The problem is that due to this problem parser loses CPU cycles even on smaller arrays, just not as noticeable. This array just demonstrates the problem acutely.

ccxvii commented 3 years ago

The non-linearity only happens with arrays larger than 32768 elements -- because of how the array initialization bytecode works. Each array entry generates three instructions (or more depending on the value) -- push_integer, push_value, set_property. The push_integer opcodes need to use a table for literal numbers that are larger than will fit in a signed 16-bit integer, and the lookup in this table is what's causing non-linearity.

I'll look into using another scheme for generating array initialization code that doesn't need to rely on the table of number literals.

ccxvii commented 3 years ago

Fixed with 06a6f9fb110fe02eb5c306a9c6c143a6cd8bd3bb and future proofed (removed the linearly scanned constant tables altogether) with 33ffe6efebf477ee1f73c27670af460ffd6cc08d.

ccxvii / mujs

Array parsing time grows non-linearly with array size #120