adamhaile / S

S.js - Simple, Clean, Fast Reactive Programming in Javascript
MIT License
1.31k stars 68 forks source link

How to initialize cyclical computations? #1

Open fenduru opened 7 years ago

fenduru commented 7 years ago

I'm looking to create a graph of computed data where there may exist dependencies. I had previously begun working on my own library, but came across this one and it seems to handle most of my requirements that other libraries don't, in particular the "no redundant computations" and its ability to handle cycles.

From my experimentation though, the only way to hook up a cycle is if the cycle loops by emitting onto a data signal. This has a few undesirable properties

Boilerplate of what I'm trying to achieve: https://jsfiddle.net/uto837uf/

Example of a not-working cycle (by creating dependent computations): https://jsfiddle.net/uto837uf/3/

One solution that I can think of to this is to have a way to create a dummy node up front so that you have a reference to use for dependency tracking, and then "replace" the dummy node with the real thing later. I haven't found a way to do this from user-space though.

adamhaile commented 7 years ago

Hi @fenduru -- In general, yes, you're right: cyclic computations generally have a data signal somewhere in the path. In fact, a totally synchronous cycle (one with no data signals in the cycle) is an error, and S will throw an exception saying so. As you mention, it takes mutating a signal to create one:

S.root(() => {
    var a = S.data(1),
    c = () => 2, // placeholder c()
    b = S(() => a() + c()); // runs w/ placeholder c()

    c = S(() => b() * 2);// real c(), runs b/c b->c dep hasn't been recalculated

  a(2); // throws "circular dependency"
});

S's execution model, what SRP literature calls the "synchronous hypothesis," requires that a signal have exactly one value per "instant." Cycles violate this: if the current value of b() depends on the current value of c(), and the current value of c() on the current value of b(), then neither can be determined.

You can have cross-dependencies between nodes so long as they aren't both active in the same instant:

S.root(() => {
    const a = S.data(false),
    b = S(() => a() ? c() : 1), // doesn't call c() yet, since a() is false, returns 1
    c = S(() => a() ? 2 : b()); // calls b(), returns 2

    a(true); // now c() calls b() but not vice-versa.  both now equal 2
});

You can also have cyclic dependencies when the cycle is split across time, aka there's a data signal in the cycle such that the current value of some set of nodes depends on the last value of others:

S.root(() => {
    const a = S.data(1),
    lastC = S.data(0), // manual starting value
    b = S(() => a() + lastC()),
    c = S(() => b() * 2);

    S(() => lastC(c())); // "glue" comp bridges c() and lastC()
});

A cycle like the one above, though, has no termination: lastC()is updated each time, so S runs the nodes again ... and again ... and again, etc. After 100,000 ticks, S will finally throw an exception about a "runaway clock detected." This is because usually this kind of cycle is a code error.

Temporally split cycles are fine if you put a guard in place though:

S.root(() => {
    const a = S.data(1),
    lastC = S.data(0), // placeholder initial value
    b = S(() => a() + lastC()),
    c = S(() => b() * 2);

    S(() => c() < 1000 && lastC(c())); // "guard" prevents infinite updates
});

For instance, you could create a cyclic computation that uses the Newtonian method to approximate a value, then exits once the adjustment gets below some threshold.

Does that help? If there's something you're trying to do with cycles that isn't covered here, tell me more about it and I can see if I can offer any help.

fenduru commented 7 years ago

Thanks for the very thorough reply!

You can also have cyclic dependencies when the cycle is split across time, aka there's a data signal in the cycle such that the current value of some set of nodes depends on the last value of others:

So this is more along the lines of what I'm trying to accomplish, however the problem with this is that since b doesn't actually depend on c (rather it depends on lastC), this causes us to have redundant and/or intermediate computations that get thrown out.

Example:

S.root(() => {
    // `c` is not defined yet because it needs to depend on `b`
    let a = S.data(1);
  let lastC = S.data(0);

  let b = S(() => {
    console.log('calculating b:', a(), '+', lastC());
    return a() + lastC()
  });

  let c = S(() => a() * 2);

  S(() => lastC(c()));

  console.log('setting a');
  a(2);
});

Compare this to the same code that doesn't have the indirection of a data signal:

S.root(() => {
    // `c` is not defined yet because it needs to depend on `b`
    let a = S.data(1);
  let lastC = S.data(0);

  let c = S(() => a() * 2);

  let b = S(() => {
    console.log('calculating b:', a(), '+', c());
    return a() + c()
  });

  S(() => lastC(c()));

  console.log('setting a');
  a(2);
});

After we set a, b gets computed twice (once with the old C and new A, once with the new C and new A). This is very reminiscent of "glitches" in libraries like RxJs.

I had been doing some digging into research projects like timely dataflow to gather inspiration, and while the whitepapers are a bit over my head, the general idea of breaking the graph into strong components, and treating each component/iteration of a component as a "subclock" seems promising.

To illustrate, let's say my conceptual graph looks like this:

image

We can break this up into strongly connected components, which turns our cyclic graph into a DAG. Each box will need to be completed before moving onto the next (seems similar to S' subclock)

image

Then, within each component, we effectively unroll the cycle into iterations. Each iteration runs in its own sub-subclock, so the calculation for B@1 won't run until there are no more computations at time 0.

image

Eventually the node C won't emit a new value (effectively the base case in the recursion), which will allow that component to be complete, and then move onto E

Do you think this kind of approach is:

Thanks again for your help

adamhaile commented 7 years ago

Hi @fenduru - Sorry for the slow reply, as I'm off this week on vacation. I'll be back next week.

I'm pretty sure this is accomplishable with current S, though I may not fully understand what you're trying to do. In general, yes, being glitchless is a core feature of S, and subclocks are designed for this kind of "run to completion" scenario. So I understand, in your diagrams, at t 0, B only depends on A. At t > 0, B only depends on C? Is that right? -- Adam

fenduru commented 7 years ago

@adamhaile No worries! My diagram isn't totally accurate - I kind of conflated "dependencies" with "new value emitted". At t>0, B still depends on A, however we know that A never changes past that point so the only node that can trigger an update to B at t>0 is C.

Your original response gave me a lot of ideas/inspiration, so right now I'm working on a prototype of some of those ideas. Once I have it fleshed it out a bit more I'll share some of my findings here.

The gist of it that I've taken the concept of B at t=n depends on A at t=n-1, then doing some static analysis on my dependency graph to avoid running computations that are redundant or will not be used.

irisjae commented 6 years ago

@fenduru Any news on how your approach is going? Interested to hear more

adamhaile commented 6 years ago

@irisjay and @fenduru, a couple utilities I'll throw out there, as they might be useful to you if you're still looking at scenarios where something like circular dependencies make sense:

1) S doesn't automatically give access to the last value of a data signal, but it's easy to write a utility that stores it in a local var and always reports whatever value a signal had before the current one:

const last = (signal, val) => S(() => { const cur = val; val = signal(); return cur; });

const s = S.data(1),
    lastS = last(s, 0); // '0' is initial 'last' value, since we don't have the actual last value of s when we start

lastS(); // starts as 0
s(2);
lastS() // now 1
s(3);
lastS() // now 2, etc

2) if the nth value of a computation depends directly on its own n-1th value, then that's what a reducing computation is for: S(last => last < 5 ? last + 1 : last, 0).

3) BUT, if the nth is indirectly dependent on its own n-1th value, then you can make something like a "spiral" dependency by proxying through a data signal. I'm trying to think of an example. Let's say you had a computation that iteratively approximated some 'true' value, which was then fed into another computation that calculated the error from the true value, and you wanted the approximation to update on this error. Well, you can't calculate the current value based on the error of the current value, because you don't have the current value yet. But you can calculate it from the error in the last value. That would look like this:

var lastError = S.data(0), // again, 0 is an initial 'last' value
    goal = S.data(0), // what approx() is approximating
    approx = S(x => lastError() < 0 ? x - 0.1 : x + 0.1, 0), // update approx by 0.1 based on last error
    error = S(() => goal() - approx()); // calculate error in current approx()

// add a logger, just so we can see how it works
S(() => console.log(approx()));

// start process running by piping error into last error with a base case to say when to stop
// in this case, we stop once the error is < 0.1
S(() => Math.abs(error()) < 0.1 || lastError(error()));

There's a dependency from error() -> approx() -> lastError(), and a computation that drives error() into lastError() until approx() gets within 0.1 of goal(). When you set a data signal, you're really setting its next value, so the system keeps ticking until we the base case is satisfied. If you run this code in your browser, you'll see that whenever goal() changes, approx() iteratively changes by 0.1 until it gets within 0.1 of goal.

Wrapping that whole system in a subclock would hide the intermediate states from outside code and present only the final value, once the base case is reached.

fenduru commented 6 years ago

Sorry for my delayed response here.

The approach of computing the current state of the strongly connected component (SCC) from the previous state of the SCC makes a lot of sense, and ultimately I think is the easiest approach for guaranteeing correctness. This still results in the undesirable behavior where (assuming your cycle will eventually reach a base case and stabilize) you will perform many intermediate computations that are wasted.

Take the following graph (assume it is stable):

image

In this scenario, we have some external signal that is going to cause our SCC to recompute. However, if we use the above approach (compute the next SCC state from the last SCC state), we end up with an execution order like this:

[ A ]
[ B, E ]
[ A, C ]
[ D, E ]
[ E, A ]
[ A, B ] <--- This is the computation of A we care about (since it is using a fully updated E)
...

We computed A much more than we actually wanted to, which is unfortunate. Ideally, we would have an execution order:

[ A ]
[ B ]
[ C ]
[ D ]
[ E ]
[ A ]
...

This would still be sound because conceptually we could combine A -> B -> C -> D into a single node that just does all 4 (sequential) steps, leaving your SCC looking like ABCD <--> E.

This problem is trivial to solve for DAG's, as we can topologically sort those, so if there is a way to turn our SCC into a DAG then we could get this ordering easily. The example above can be intuitively converted to a DAG by removing the edge E -> A. This leaves us with a familiar looking dependency ordering problem, and a topo sort gives us the desired order.

More generally, this problem is called the minimum feedback arc set (MFAS) problem. Basically the MFAS is the smallest set of edges you can remove to break the cycles. There are some clever ways of approximating the MFAS without horrible runtimes, but for an exact MFAS this paper is extremely detailed and covers some additional optimizations for common structures.

It took me a while to find out about the term MFAS, so in my code (which is using a custom datagraph processing library I wrote, not S) I have a jank approach where I find the longest distance between each pair of nodes, and order my computations that way (for instance, max(A -> B) = 1, max(A -> E) = 4, so we B will be inserted before E in a priority queue). The performance of this approach is horrendous (though this cost is paid once at initialization time), but in my case the number of nodes/edges in my SCC is pretty small, so I was willing to tolerate it.

For my particular use case (dual dependency between vertical/horizontal scrollbars, combined with flexible containers/content) I've determined a simpler approach that models my problem without any cycles, so I eventually plan on ripping out the jank cycle code and just using a DAG.

adamhaile commented 6 years ago

S does topological execution. To be more precise, it doesn't guarantee that computations start in topological order, because it can't -- S's dependency graph is dynamic and changes each execution, due to conditional branches and higher-ordered signals (signals carrying signals). Since we don't know what the graph will be prior to execution, we can't sort by it.

What we can guarantee is that computations complete in topological order. That means that when S detects that a computation is running before one of its sources has updated, it may pause the execution of the original computation to update the source before returning the source's new value and resuming execution of the original computation.

The algorithm behind this is pretty simple: S is a "two-pass" reactive implementation, meaning it does a first pass to mark all computations downstream of a modified data signal as invalid (i.e., it's not "height"-based, like it sounds like yours is). The second pass it updates the computations. The order of these updates actually isn't that critical, because if we start to read a computation that is still invalid, S executes and re-validates it then and there. When S later encounters that computation at its originally scheduled time, it notes that it's already been revalidated and doesn't re-revalidate it.

Short of all this is that I believe S is already doing what you call the "ideal" scenario.

Here's some code that, if I understand you correctly, implements your topology:

var input = S.data(0),
    lastE = S.data(0),
    A = S(() => input() + lastE()),
    B = S(() => A() + 1),
    C = S(() => B() + 1),
    D = S(() => C() + 1),
    E = S(() => D() + A() + 1),
    driver = S(() => E() > 20 || lastE(E()));

In this case, the looping continues until E() is greater than 20.

If you want to see how it executes, add a logging statement to record every time A() updates:

S(() => console.log(A());
S.freeze(() => { input(0); lastE(0); });
fenduru commented 6 years ago

@adamhaile yes, in the sample you provided it will execute in the desired way, however by explicitly piping E back into lastE you're essentially encoding the solution to MFAS problem I described above. In other words you have manually broken the cycle into a DAG that will be run iteratively, rather than the system doing it for you.

The problem is this doesn't generalize though. Including additional inputs or feedback paths makes it far less obvious how to break the cycle down - it may even differ at runtime depending on which inputs actually emit a new signal. My particular use case complicates it even further as the exact shape of my graph isn't known until runtime.

I think if you are building a particular graph, and you know exactly how it should execute, then this approach may work well - though in that scenario it also makes a reactivity system less interesting as you could just as easily write the computation imperatively.