folktale / data.task

Migrating to https://github.com/origamitower/folktale
MIT License
425 stars 45 forks source link

Limit for number of Tasks running in sequence? #43

Closed fmezas closed 6 years ago

fmezas commented 6 years ago

The number of Tasks that can be run in sequence seems to be limited by the max size of the function stack for the platform. Sample code:

import R from 'ramda';
import Task from 'data.task';
import monads from 'control.monads';

const COUNT = 10000;

const l = R.repeat(1, COUNT);

monads.sequence(Task,
                R.map(Task.of, l))
  .fork((err) => console.error(err),
        (res) => console.log(res));

will produce:

/home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:113
  return new Task(function(reject, resolve) {
                          ^

RangeError: Maximum call stack size exceeded
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:113:27
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
    at /home/francisco/Documents/code/folktale/node_modules/data.task/lib/task.js:114:12
robotlolita commented 6 years ago

Turns out This Is A Very Hard Problem :'>

sequence is a generic function over monads, so it can only use chain. It only works for larger amount of items if the VM implements tail calls. It's not possible to trampoline the structures without changing them, sadly. The state of implementing tail calls in JS is not great. With v8 removing its partial implementation (https://bugs.chromium.org/p/v8/issues/detail?id=4698#c73) earlier this year, only Apple's JSC has implemented that spec, and discussions seem to be going nowhere. PTC is required for these structures to work properly :<

More recent Fantasy Land specs have introduced a separate algebra, ChainRec, which kind of stands for a "tail-recursive monad" of sorts. A new sequence function could be written for that, and it wouldn't have problem with stack sizes. New features will only happen in the new version of Folktale.

The one way to mitigate the problem you're seeing is to wrap all of your tasks in some asynchronous operation, so they start in an empty stack. Something like this would work:

const unstack = (task) => task.chain(v => {
  return new Task((reject, resolve) => {
    setTimeout(() => resolve(v));
  }, (timer) => clearTimeout(timer));
});

monads.sequence(Task, R.map(unstack, R.map(Task.of, 1)))
  .fork(...);

This, however, adds a fair amount of overhead between each Task. Instead of just having one task start another (and maybe have the VM optimise that), each Task schedules its continuation and yields back to the main thread. With setTimeout, this might add delays of more than 10ms between each Task because of the padding it does. With something like setImmediate or process.nextTick you pay the price of one event tick between each task, which can still be a lot.

You could group a limited amount of tasks together and move them to a clean stack (with the unstack function) as a group, but that's unreliable since you can't know how much stack space you still have left when chain is called :<

Fluture implements ChainRec if the overhead of starting on an empty stack is too much for your codebase. I'm not familiar with its design choices and performance characteristics though.

Avaq commented 6 years ago

Fluture implements ChainRec

Not just that. Almost all operations in Fluture are stack safe. ChainRec is really just there for interoperability. Just try to run the exact same code given above but using Fluture:

const R = require('ramda');
const Future = require('fluture');
const monads = require('control.monads');

const COUNT = 1000000; //I increased the number a little to make sure

const l = R.repeat(1, COUNT);

monads.sequence(Future,
                R.map(Future.of, l))
  .fork((err) => console.error(err),
        (res) => console.log(res));

On my i5 machine, after 2.8 seconds, we see:

[1,
 1,
 1,
 ...,
 1,
 1,
 ... 999900 more items]

This all happened in the same tick we called fork.

robotlolita commented 6 years ago

@Avaq does Fluture use free monads or something similar to build the nested computations and run them later using a constant amount of stack space?

Avaq commented 6 years ago

@robotlolita It doesn't use free monads directly, but the same principal. Every operation just builds up a structure (much resembling an AST) and fork is the interpreter of that structure.

fork uses the same trick that is used in ChainRec implementations, where everything is executed in a single while-loop that only breaks when the operation did not call back synchronously.

The second trick is the flattening of newly encountered AST's. When we call a chain, for example, we (lazily) get back a new AST, which is flattened into the current one at runtime, allowing the while-loop to continue without having to break. This part makes it so even recursion is stack safe.

fmezas commented 6 years ago

@robotlolita @Avaq : tnx a lot for the explanation of root cause and the pointers to additional info. I'll take a look at Fluture. By the way, is the implementation of ChainRec in the plans for folktale 2.0?

fmezas commented 6 years ago

nvm. found it in the backlog. tnx