Sage / streamlinejs

Asynchronous JavaScript for dummies
http://bjouhier.wordpress.com/2011/01/09/asynchronous-javascript-the-tale-of-harry/
MIT License
959 stars 56 forks source link

Memory issues when calling forEach_() on a big array #364

Closed keithmo closed 7 years ago

keithmo commented 7 years ago

Consider the following fragment:

const lodash = require('lodash');

const NUM_ROWS = 10 * 1000 * 1000;
const CONCURRENCY = 2;

function processRow(row, _) {
    console.log(`processRow ${row}`);
    setTimeout(_, 0);
}

function processRows(rows, _) {
    rows.forEach_(_, CONCURRENCY, function(_, row) {
        processRow(row, _);
    });
}

function main(_) {
    let rows = lodash.range(0, NUM_ROWS);
    processRows(rows, _);
}

main(_);

This outputs processRow 0, outputs processRow 1, hangs briefly, then aborts with a memory allocation error.

This, on the other hand, works fine:

function main(_) {
    let rows = lodash.range(0, NUM_ROWS);
    let chunks = lodash.chunk(rows, 100000);
    chunks.forEach_(_, 1, function(_, chunk) {
        processRows(chunk, _);
    });
}

It works with chunk sizes up to 100K or so, but fails with chunk sizes more than 1M. (I haven't found the exact cut-off value, but I can if that would be helpful.)

I'm running Streamline 2.0.16 and Fibers 1.0.13 under Node 4.4.4.

bjouhier commented 7 years ago

Thanks for reporting this. The implementation of the parallel functions was extremely naive, and awfully inefficient. It was allocating one future (and thus one fiber) per array entry, and then waiting on all these futures.

The new implementation allocates one future per parallel task, i.e. 2 futures instead of 10,000,000 in your repro case.

You don't have to upgrade streamline itself. The fix is in the streamline-runtime dependency. I just published streamline-runtime@1.1.3 to NPM.

keithmo commented 7 years ago

Thanks, @bjouhier!