cujojs / when

A solid, fast Promises/A+ and when() implementation, plus other async goodies.
Other
3.44k stars 396 forks source link

Rethinking progress #264

Open briancavalier opened 10 years ago

briancavalier commented 10 years ago

Promise progress is problematic for several reasons. One big one is that progress values are opaque (they might be numbers, objects, arrays, etc.). A promise library can't combine them in any meaningful way. For example, when.all can't combine the progress values from its input promises. It must simply pass them through. This means that the promise returned by when.all might receive heterogenous values: some are objects, some are numbers, etc. and the caller would have to write code to combine them.

I would put money on progress being used in 95+% of cases to track I/O operations in order to show the user some sort of progress indicator. There are other uses, but I've rarely seen them. One example is wire.js, which could (but doesn't currently) track progress toward wiring completion, which involves wiring a discrete number of components. Another example is tracking the progress of UI wizard steps (I'm not convinced that's actually a good use case, but I've seen it).

With that in mind, what if progress values were required to be a number between 0 and 1, literally representing "percent complete". That seems to have some interesting properties:

  1. It's likely fairly easy to compute. For anything with discrete steps, it's steps accomplished / total steps. e.g. For I/O, it's bytes read / total bytes.
  2. It's easy to combine via averaging for operations where all promises must fulfill (eg when.all, when.map, when.reduce, etc etc). For 2 promises, combined = (progress1 + progress2) / 2, and by extension for N promises: combined = sum(progressValues) / progressValues.length. I believe this works for sequential, and "parallel" tasks.
    • It's also associative: progress2(a, progress2(b, c)) === progress2(progress2(a, b), c). which means that progress values from arbitrary operations (all, any, race) etc. can be combined easily, again via averaging: progress2(all(array1), p)
  3. There are other ways to combine:
    • when.any and when.race -> Math.max.apply(progressValues) (it's a race, so progress can be estimated by the one that appears to be winning)
    • when.some -> sort progress values in descending order, take top N, combine via average.
    • when.settle -> same as when.all?

I'm hoping that we can discuss here, and if this makes sense, talk to other promise library maintainers to get their thoughts as well.

unscriptable commented 10 years ago

Just throwing out another use case:

curl.js uses progress notification to indicate state changes (here and here), some of which might not occur. These could be expressed in terms of contrived numbers between 0 and 1 since they happen in a certain sequence, I guess.

briancavalier commented 10 years ago

Interesting use case ... the "might not happen" especially. Does anything consume those progress "states"?

I guess there are a couple ways it could work.

One is to just have each step emit a preset value, like 0.25, 0.5, 0.75, etc. And the last step emit 1.0. If any steps are skipped you still end up with 1.0 at the end, you just don't get intermediate values for things that were skipped.

Another is to have each step add 1/totalSteps.

In both cases, you kinda end up with a weird code maintenance issue. If you add a new step, you have to adjust the progress values that the other steps emit. Hmmm, or maybe just change the totalSteps constant in the latter approach.

briancavalier commented 10 years ago

Hrm, if you add 1/totalSteps, I guess you still have to emit 1.0 at the end :/

maciasello commented 10 years ago

Some time ago I was using progress to emit partial results of the long running operation so that they can be presented to user ASAP. It was by no means just pure number, but a complex object. However there was no sophisticated promise chain in the middle, thus the problem stated is valid of course. I'm just not sure if it will not kill many practical use cases.

briancavalier commented 10 years ago

@maciasello partial results are definitely an interesting case ... like an array that isn't full yet, or somesuch. Yeah, need to think about that. Do you have any thoughts on how the two approaches (arbitrary opaque values, and numeric 0.0-1.0 progress) might be used together?

What if progress could carry 2 pieces of information instead of 1, a numeric value, for which you get mathematical combining for "free" from the promise impl, and an opaque value that you have to deal with yourself. Does that make any sense?

maciasello commented 10 years ago

@briancavalier its definitely worth consideration. Library support for easing progress measurements seems to be good idea. The approach with 2 pieces carried is fine as well. Do you mean something like checking for existence of a well-known property (progress?) on a progress value object? Or more like wrapping both into: {progress: 0.5, value: {}}?

Do you have an idea how to get things correct with such a situation:

progressing_promise.then(function() {
    // do something
    return progressing_promise2;
}).then(null, null, function(progress) {
    // we get first set of progresses from progressing_promise from 0.0 up until 1.0
    // and then we get second set of progresses from progressing_promise2 starting from 0.0 until 1.0
})
briancavalier commented 10 years ago

@maciasello Honestly, I don't know yet what the best representation of the 2 pieces of data would be. An object like the one you suggested would work. Another option would be 2 parameters, although that might be a little weird in that other promise operations deal with only a single parameter. Definitely have to think carefully about that. Using 2 params, it might look something like:

// emitting progress:
return new Promise(function(resolve, reject, notify) {
    // do stuff
    notify(.5, intermediateState);
    // do more stuff
    notify(1.0, finalState);
    resolve(result);
});

And consuming progress:

var p = doAsyncThingWithProgress();

p.progress(function(percentComplete, currentState) {
    // percentComplete is a number 0 <= percentComponete <= 1.0
    // currentState is any arbitrary value
    // what to return here??
});

One obvious question is what the above progress handler should return! Maybe it's allowed to return a new arbitrary/transformed state, but the numbers are always handled automatically. Not sure.

A side note: In some cases, if what you have is actually streaming data, then promises aren't necessarily the right fit, and an actual stream (either a data stream, or a discrete event stream) may be a better fit for those situations.

In the case of your example, I'm thinking the progress values should be combined using the same averaging approach as when.all. Here's why I think that makes sense, using a slight refactoring (but still equivalent) of your example code:

p3 = p1.then(function() {
    return p2;
});

p3.then(null, null, handleProgress);

If you think about it, progress toward completion of p3 is the average of progress toward p1 and progress toward p2, that is, progress3 === (progress1 + progress2) / 2. For example, say you need to do some task, like "run errands" (p3), which consists of two other tasks that must be completed in sequence, "buy groceries" (p1), and "pick up dry-cleaning" (p2). When you've done half your grocery shopping, you're actually 25% done with "run errands". When you've finished your grocery shopping entirely, you're 50% done with "run errands", etc.

So, I think combining 2 promises either sequentially, or in parallel, when the goal is to fulfill both promises, can be done by applying the same averaging technique. By extension, the same is true for N promises. Does that sound right? If so, that seems pretty cool.

sompylasar commented 10 years ago

A side note on data streams: if I get things right, cujo's msgs module offers some sort of event streams. Maybe use these in addition to the proposed numeric values if required by the user (the module is included or the promise is explicitly configured). Each promise would have its own event stream, each event passed through a promise has a back reference to the promise for identification. The streams are combined by serializing the incoming events so they all arrive into the final promise in the chain.

scothis commented 10 years ago

@sompylasar msgs.js provides a general model for working with messages. While you certainly could model a stream inside a message bus, it's not going to be as efficient as an actual stream. That said, there is strong support in msgs.js for adapting to/from Node Streams, so it's easy to mix and match as needed.

unscriptable commented 10 years ago

It feels wrong that progress events have two params when all other promise "events" have one. Maybe I'm just being picky? Anyway, here's an idea for allowing non-numeric objects to be combined:

var myProgressState = {
    status: "almost-done",
    value: 0.95,
    valueOf: function () { return this.value; }
};

If when.js's internals ensured that they cast the progress state object to Number, the math would just work. If a state object without valueOf() were to slip into the mix, the output would be NaN.

The problem, though is what to do with the non-numeric parts of the states when combining them? Sorting to find the max seems easy enough to do. Just sort the state items and choose the one that ends up at the top. But how do you average arbitrary objects together? Finding the median seems easy enough, but I'm guessing median isn't really going to be the right thing much of the time.

I know I've told Brian this, but I have convinced myself that events aren't really what promises should provide. Yes, we need two events: onFulfilled and onRejected, but we only need those because of the way that promises must interact with the language. Promises are containers for a future value; the onFulfilled and onRejected events are just an implementation detail.

briancavalier commented 10 years ago

@sompylasar msgs.js is a full messaging bus, complete with enterprise integration patterns. That's really way more than was ever intended for promise progress. You can certainly use them together, though, eg putting messages on a bus to indicate progress toward some goal that involves promises, or broadcasting a message when a promise resolves.

The intent of promises, as we know them today in JS, is really to be a proxy for a single value, like @unscriptable said, and to give you a programming model that is roughly analogous to synchronous--transforming values, handling errors in a sane way, etc. In that sense, progress, especially arbitrarily complex progress just seems weird. You either have an integer or you don't yet.

Of course, promises complicate that analogy precisely because they have a time component. Take an Array, for example. On one hand, you could say that you either have an Array or you don't yet. On the other, though, it's easy to consider a partially filled Array as some sort of indication of "progress". I honestly think that in the "partially filled Array" case, promises are probably not the right thing. Using some sort of async streaming data type, like a most/Stream can be a better fit.

If we accept that promises represent an atomic value (even if that atomic value is an Array) that might materialize later, then one obvious question we'd want to answer is: "when will that value be available?".

I see a couple fairly obvious ways to answer that in a general sense. One is to allow a promise to provide (or be queried for) some sort of indication of how "close" the value is. A number between 0 and 1 handles that well. Another way is to provide an ETA--an estimate of the actual time that the value is expected to be available.

Given one of those, you can compute the other. For example, the ETA of setTimeout(f, 100) is trivial Date.now() + 100. The "closeness" t is easy to compute from ETA: var t = (Date.now() - start) / eta.

Sometimes one is easier to determine initially, though. For example, determining ETA for setTimeout is easy. Determining ETA for an XHR is not as easy. You'd need to interpolate the ETA by fitting a line or curve, whereas computing t is easy if the server provided a Content-Length header.

Is it ironic that I just used XHR progress to try to argue for simpler promise progress? Perhaps, but I don't think so :) While XHR progress is a complex object, it is not an arbitrary object. It has a known format, which, by definition, arbitrary promise progress events do not.

A well-defined promise progress object could be a solution, and it might help with the "passing two parameters" weirdness. The pair of (number, arbitraryProgressThing) could easily be passed as a single argument using an object with well-defined keys, or as a 2-element array.

briancavalier commented 10 years ago

Another potentially interesting aspect of using 0.0-1.0 is that progress can be synthesized for operations involving multiple promises where all input promises are expected to fulfill (eg when.join, when.all, when.map, when.reduce), even if the input promises don't provide explicit progress updates. In that case progress you can consider every input promise's progress to be binary: 0.0 for each input promise to start, and 1.0 for an input promise that fulfills. To compute the output promise's progress, just average them as usual. So, when half of the input promises have fulfilled, the output promise's progress is 0.5.

I wonder if there are formulae that would work for competitive races like when.any and when.some in the case where the inputs don't provide progress as well ... it's too early and I haven't had enough coffee.

briancavalier commented 10 years ago

I'm gonna try to make some time to prototype this later in the week, or maybe over the weekend. It's too late for it to go into when 3.0, and would be a pretty nasty breaking change anyway :) So, we have time to do some experiments.

briancavalier commented 10 years ago

It looks like the simple math doesn't quite work out for sequential promises. For example, say you have 3 promises in sequence, like:

let p2 = p1.then(...);
let p3 = p2.then(...);

Let's say each promise has only binary progress: 0 while pending, 1 at the instant of fulfillment. If you're listening to progress on p3, you'd probably expect that, after p1 fulfills, p3's progress would be 0.333, by computing (progress(p1) + progress(p2) + progress(p3)) / 3 = (1 + 0 + 0) / 3 = .333. But that's only possible if p3 knows that it is one of three promises. In some sense, p3 must "know about" p1, but typically, it will only "know about" its parent and children (promises create a graph). If promises only know about their parent, then the computation would really be more like: (((progress(p1) + progress(p2)) / 2) + progress(p3) / 2) = (((1 + 0) / 2) + 0) / 2) = .25.

It's not really clear to me what to do about that. Does it even make sense to try to compute progress for sequential promises that way ... hrm.

maciasello commented 10 years ago

@briancavalier that is what I had in mind when I thought about the example above, however I failed to cover that in the description and your then-response had dulled my vigilance :)

briancavalier commented 10 years ago

@maciasello Yeah, that 0.0-1.0 math doesn't work for sequential, but ETA would. With a relative time in ms, p3's ETA would be ((eta(p1) + eta(p2)) + eta(p3). Since that removes the division, p3 doesn't need to know that it is one of three promises. Similarly if ETA is an absolute time, p3 could simple add it's estimated duration to the ETA of p2 (which would have done the same based on the ETA of p1).

In fact, ETA may be better all around than percentage:

  1. Sequences: use addition to compute ETA (as above)
  2. Parallel (eg all()): Math.max
  3. Races (eg any() and race()): Math.min

So, I still think this is worth exploring.

I also still think the current situation is worse, since there's absolutely no way to combine parallel values or races, and a user-defined function is required to handle sequences. Arbitrary values for progress basically means the only time it's safe to use progress is when you are in full control of both producing and consuming the progress values. It's just not a great situation :(

scothis commented 10 years ago

Progress is clearly worthless right now because, expect for isolated environments, there is no consistency about what progress means.

ETA seems particularly tricky. How often do you know how long a promise is expected to take?

I like the idea of averaging the progress values, however, some task are inherently more complex than others. What if progress events where a ratio of work steps completed and total work steps. That would make it possible to do a weighted average across an array of promises.

In rest.js, there are many places where the promise is a no-op and just passes the value along, while other times it needs to manipulate the object. For the no-op, I can pass the progress values as is, for the transforms, I can increment the number total steps.

For an operation like when.all you sum both the work completed and sum the total work for each promise. For when.any you return the progress values for the furthest complete promise, ignoring the other values. when.some is the same as when.any, but take the top x value and sum them like for when.all.

briancavalier commented 10 years ago

ETA seems particularly tricky. How often do you know how long a promise is expected to take?

In some cases, percentage is easier to compute for sure. In others, ETA is easier. Typically you can compute one from the other, tho, so most of the time (all of the time?), either is possible. Either way, though, an estimate is the best you can do for some situations (eg network transfers). For example, ETA is particularly easy for promise.delay(100) and most likely needs to be computed once by the promise machinery, whereas percentage would need to be continuously/periodically computed. On the other end of the spectrum, computing percentage for a network transfer (as long as you know the total expected transfer size, ie Content-Length) is easy. ETA is still possible, though slightly more involved.

What if progress events where a ratio of work steps completed and total work steps.

Percentage and ratio are the same thing :) (after almost finishing this post, I realized you might mean something else here, see below!)

For an operation like when.all you sum both the work completed and sum the total work for each promise ... when.any ... when.some

Right, this is documented upthread. But see also the examples upthread of why percentage (ratio) is impossible to compute accurately for sequential situations like resultPromise here: var resultPromise = p1.then(getP2).then(getP3); without additional information: resultPromise needs to know the total number of steps in the sequence. Might be able to do that if "progress" is represented as a pair [completedSoFar, total], but I haven't really thought it through. Hmmm, is that what you were suggesting above with the ratio idea?