Lost parallel results - Githubissues

diachedelic commented 3 years ago

In some extreme cases, the value array produced by parseq.parallel() is of a different length to the requestor array. I experience this on Node (v15), Chrome (v86) and yes, even on Safari (v14).

/*jslint node, browser */

import parseq from "./parseq.js";

parseq.parallel(
    new Array(20000).fill(
        (callback) => callback(true)
    )
)(function (value) {
    console.log(value.length);
});

$ node parallel_fail.js 
15170

This may seem far-fetched, but I came across it and it had me stumped. Throttling seems to have no effect.

douglascrockford commented 3 years ago

Really interesting. I had tested up to 1000. It appears that Firefox starts losing around 5199. Chrome and Edge both start losing around 3227. I suspect their similarity is due to their sharing of Chromium. After that point, about a third get lost, and the debuggers get confused.

On each run, the results are different, but similar. It is behaving like a random process. Perhaps it is exhausting the browsers. But I would expect to see a complaint in the console, and there is none.

douglascrockford commented 3 years ago

If your function uses setTimeout(callback, 0, true) to invoke the callback, then all is fine. The issue is having over 3000 resolutions within a single turn. I do not know why that is the case, or what the real limitation is.

The only reason to use parallel is so that the work can be distributed over many turns serviced by many processes. Doing thousands of things in parallel in a single turn is actually doing them sequentially. Parallel is clearly not the tool for that.

hacksave commented 3 years ago

In this particular case, start_requestor become a recursive call. Adding console.log(exception) to the catch section of start_requestor, Chrome will print out "RangeError: Maximum call stack size exceeded" and Firefox will print out an unhelpful "Error: undefined".

Evaluating the returned array, one will begin to notice there is undefined element in the array once the interpreter reach its recursion limit.

Try commenting out the recursive section of the code, and it will solve the problem for this particular case:

//return start_requestor(
//    factory_name === "sequence"
//    ? value
//    : initial_value
//);

The reason on why the code become a recursive function is due to (callback) => callback(true) being synchronous instead of a callback. It operate within the same turn, which is also the reason try / catch clause inside start_requestor was triggered. Since callback are handled differently within the interpreter, the author's suggestion of setTimeout(callback, 0, true) was able to succeed without problem.

The flow for the synchronous code roughly goes like this. At the last part of run, all 20,000 start_requestor will be queued by setTimeout. The first setTimeout to start will be number 20,000 (the one that was created first). Being synchronous, the evaluation will take place immediately and start_requestor_callback is called, follow by start_requestor that is within it. This initiate the 2nd requestor that is within the requestor_array and the loop repeat itself till reaching stack limit. It then will be caught by the catch clause and once again start the start_requestor within the catch clause. This goes on and the stack limit will be hit every couple of loops and the whole program just go haywire.

After the recursive madness, the first setTimeout will finally end while already exhausting the whole requestor_array. The 2nd setTimeout (number 19,999) will be triggered and since the array has all been initiated by the first setTimeout recursively. The 2nd setTimeout will just end itself and same applies for the rest of the setTimeout that was queued previously.

When using a callback, the flow roughly goes like this. The first setTimeout that was queued will start and then it will end while having start_requestor_callback passed to the callback. The same applies to the 2nd setTimeout and the remaining ones. After all the setTimeout queued have run and ended. The callback will be triggered and pass its result to start_requestor_callback which will initiate the start_requestor within it. Since all the requestor in requestor_array has already been already been initiated, this start_requestor will not do much and end itself. The same applies to all the 19,999 requestor that has been queued and hence no stack limit was reached.

hacksave commented 3 years ago

Another trick to try out will be to replace the mentioned recursive code:

return start_requestor(
    factory_name === "sequence"
    ? value
    : initial_value
);

with a setTimeout:

setTimeout(
    start_requestor,
    0,
    (factory_name === "sequence"
    ? value
    : initial_value)
);

This will also produce the correct result. But as mentioned previously, the main issue is that (callback) => callback(true) is actually an synchronous code.

diachedelic commented 3 years ago

It seems we all agree that there is something distasteful about how (callback) => callback(true) is used in the example above – it is not in the spirit of parseq.

Suppose we loosely classify requestors into three categories. There are those which

always call their callback in the same turn (immediate requestors)
always call their callback in a subsequent turn (eventual requestors)
sometimes call their callback in the same turn, but otherwise call their callback in a subsequent turn (mystery requestors).

The obvious solution to this problem is to simply avoid calling parseq with large arrays of immediate or mystery requestors. Unfortunately this is not trivial, as demonstrated by the following example.

function orange_requestor(callback, data_array) {
    return parseq.parallel(
        data_array.map(
            (data) => apple(data)
        )
    )(callback);
}

Is it possible to determine by visual inspection whether orange_requestor will behave as expected, or silently corrupt the data? No! We have no choice but to either review every piece of code which calls orange_requestor (checking for bounds on data_array.length), or review the source code for apple (to determine its possible immediacy). Let us dig deeper.

function apple(data) {
    return parseq.sequence([
        pineapple(data),
        mango()
    ]);
}

In this case we learn nothing about the immediacy of apple, because requestor composition hides this property very well. A programmer may write a requestor capable of calling back immediately for several reasons:

examination of the value reveals that no action needs to take place, or
the result is cached, or
it fails immediately.

In doing so, the wait of an extra turn is avoided, and this seems both simpler and economical. But when such requestors are composed with parseq, their blocking potential is multiplied. Suppose that the sequential work done by apple_requestor takes t milliseconds, and it is called n times in parallel. If it calls back immediately, the event loop is blocked for n * t milliseconds. If it instead calls back on a subsequent turn, the event loop is blocked for t milliseconds.

Should the responsibility for guarding against these gotchas lie with requestor-composing factories (like parseq), or the requestors themselves?

diachedelic commented 3 years ago

I made a mistake – the requestor returned by apple is always eventual, as parseq gives us that guarantee. However, this may not always be true for other requestor-composing factories.

douglascrockford commented 3 years ago

The statement

                        return start_requestor(
                            factory_name === "sequence"
                            ? value
                            : initial_value
                        );

is a tail call. ES6 requires that the activation record be reused if possible, eliminating recursion limit failures as well as generally improving performance. The problem is that many of our JavaScript engines are still not standards compliant. So until Google brings their implementation up to standard quality, Parseq will do this workaround:

                        setTimeout(start_requestor, 0, (
                            factory_name === "sequence"
                            ? value
                            : initial_value
                        ));

diachedelic commented 3 years ago

Thank you, that change certainly does resolve the issue perfectly.

However, after re-reading the chapter on tail calls from your book, I am sure the problem had nothing to do with tail calls. I ran the test case above on Node v6 with tail calls enabled, and only experienced a slight improvement:

$ node parallel_fail.js
15294 undefined
$ node --harmony --use_strict parallel_fail.js
17142 undefined

Regardless, I just cannot believe how ineffectual the browser vendors have been about tail call optimisation. Safari seems to have dropped support for it too now. Such a step backward!

douglascrockford / parseq

Lost parallel results #11