caolan / highland

High-level streams library for Node.js and the browser
https://caolan.github.io/highland
Apache License 2.0
3.43k stars 147 forks source link

Highland v3.0.0 #179

Open vqvu opened 9 years ago

vqvu commented 9 years ago

This issue tracks any changes we want for v3.0.0. See the 3.0.0 branch.

The biggest change for now is a reimplementation of the Highland engine. See the original PR at #175.

Breaking changes

vqvu commented 9 years ago

@svozza Why is #191 necessary when you can just use through and get the same result?

I see a checkbox in the original post about merging in #191 (which I added), but I wasn't thinking about #240 at the time.

Edit: To clarify. My thinking was that pipe would be used to mimic the behavior of node stream's pipe, while through would be used when you want a Highland stream back. We can even make through short-circuit the double-pipe when the piped-to stream is already a Highland stream. But I didn't see a reason to have pipe behave slightly differently depending on the type of its input. That's why I originally closed #191.

svozza commented 9 years ago

Well I'm not pushed but it's hardly inconceivable that a user might not read the docs and pipe a Highland stream into another one and if the price of us preventing a strange crash for one of those users is just a simple if statement in pipe then I don't see why we wouldn't do it.

vqvu commented 9 years ago

The standard behavior of pipe is to throw if you don't have an error handler attached though, so someone not reading the docs would probably assume the standard behavior, right? How about adding a statement in the pipe docs pointing to through instead?

svozza commented 9 years ago

Yep, sounds good to me.

vqvu commented 9 years ago

I'll make a PR.

LewisJEllis commented 9 years ago

I'm interested in spending some time soon to help push 3.0 out; @vqvu and @quarterto can you give some guidance on what's left to do that I can help with? The todo at the top seems pretty up-to-date, but I haven't been following closely enough to know exactly what each task entails. Happy to help with docs, testing, and any low- to medium-hanging fruit.

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

vqvu commented 9 years ago

We have:

1) Test for making sure back propagation works for all transforms. Roughly speaking, something along the lines of

stream.onDestroy(cb)
    .transform(...)
    .destroy();

// assert that cb was called.

for all transforms. This is more of a problem for transforms (listed in the todo) that return a stream created with this.create and not this.consume. We need a this.createDownstream (or equivalent name) that does a this.create and binds an onDestroy handler like consume does.

2) Tests to make sure that a fork that is destroyed does not contribute backpressure. That is, assert that the following no longer blocks indefinitely.

var s = _([1, 2, 3]);
var s1 = s.fork().take(2);
var s2 = s.fork();

s1.each(_.log);
s2.each(_.log);
// => 1
// => 1
// => 2
// => 2
// => 3

3) docs for use.

4) a highland-2 module that restores the behavior of the transforms that changed (i.e., undoes the renames and reargs that we made). I'm not sure exactly how to expose this. Ideally as a separate npm module, but where would we put the repo?

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

I'd like to get this done for 3.0, but we'd want to do this last, since the code movement would make it much harder to merge in the PRs that were applied to 2.x.

jeromew commented 8 years ago

@vqvu I saw your mention of 3.0 here - https://github.com/caolan/highland/issues/388#issuecomment-156400784 and realize that bugfixing 2.x engine bugs are a bummer since many things are automatically fixed in the work you did on 3.x.

I think that the engine fixes + the use work that you did with @quarterto on https://github.com/caolan/highland/pull/337 are big improvements to the library that we should not hold very much longer.

The main drawback as I understand it would be on performance where we would lose some throughput (I don't have numbers on the last version) but the fixes in the engine + the extensibility would help the community and @vqvu has already shown that there was a way to unroll his engine to gain more speed at the expense of readability of the code.

Regarding speed, I noted one of the remarks of @caolan :

I'm a tentative +1 (since I've not had time to try this out on some real projects yet), but the code looks good and I trust your (and the other collaborators) judgement regarding a release. I'm interested in the ramda integration plans, since that could give us an complimentary tool for sync use > cases where performance is critical.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

@vqvu how can we help towards the release of 3.0 ?

vqvu commented 8 years ago

I think that the engine fixes + the use work that you did with @quarterto on #337 are big improvements to the library that we should not hold very much longer.

Agreed. The hold up is more me (and I suspect the rest of the collaborators too) not having time to work on this rather than any real blocking issue. @quarterto had a good suggestion to release a 3.0.0-beta1 to npm while we work on finalizing the release, and I think that's a good idea.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

We currently only have integration with transducers, and while I don't have hard numbers, I suspect using transducers for sync transforms is faster than using highland directly. Transducers could very well be our answer to the sync problem.

I'm not sure about integrations with libraries like lodash. I don't see a way to integrate with them beyond this little snippet.

stream.batch(some_reasonable_number)
    .flatMap(function (array) {
        // use lodash here to turn array into result.
        return _(result);
    });

At this point, sync performance isn't that bad compared to 2.x, so I'm not too concerned about it. My opinion is that while we should of course care about sync performance the utility of Highland is more about sequencing async computations than raw throughput.

Ramda integration is kind of stalled at the moment, but I think it's more about allowing people to leverage their library of transforms, and especially the FP programming style that comes with it, rather than performance. The integrations that was being proposed did not have to do with performance. As it relates to the 3.0.0 release, I can't think of any further breaking changes that we might need to make to make to support ramda integration, so I think we're safe to proceed here. Unless someone things otherwise?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

The highland-2 module would be more of a "undo the naming/arg order changes that we did to the transforms". We'd need another one for "non-basic" transforms.

Here's a question for you and the rest of the collaborators. How do we want to handle the multi-module situation? Do we simply package the modules with highland proper? Or is it worth it to create a highland organization to house the different modules in different repos? Bundling everything up would be the simplest, but I think that would defeat the purpose of spliting up the transforms into basic and non-basic packages. If the user is going to download them all anyway, they might as well be allowed to use them by default.

@vqvu how can we help towards the release of 3.0 ?

I think my previous comment here is a good list of the things still pending. They're all boring documentation/testing tasks, which explains why they've been stalled for so long.

Also, if you or someone else with npm access can release the current 3.0.0 branch as 3.0.0-beta1, that would be great. The 3.0.0 branch is a little behind master, but it's only lacking the dependency updates we've made within the past week. If possible, it'd also be nice to add a release:major-beta so we have it for future uses.

I'd like to refactor index.js and test.js into multiple files so that they're easier to manage, but that's quite a bit of work and can be done after the release.

svozza commented 8 years ago

What exactly is involved in this task:

Create a test like noValueOnError to test the backpropagation behavior

I should be able to look at it tomorrow.

vqvu commented 8 years ago

Basically, given an infinite stream s, if we do

var i = 0;
var s = _(function generator(push, next) {
    push(null, i++);
    next();
});

s.onDestroy(function destructor1() {
})
.through(_.someTransform(...))
.onDestroy(function destructor2() {
})
.take(1)
.resume();

destructor2 and destructor1 should both be called at some point. Furthermore, generator should never be called after destructor2 has been called.

Many transforms get this behavior for free when they use consume. However, some (like sequence) actually return a new generator stream. These transforms should be using this.create to create the new streams, and create should take care of setting up the appropriate hooks so that an onDestroy on the result stream gets propagated to the parent.

svozza commented 8 years ago

Cool. I'll have a go at this tomorrow evening. Btw, I think the idea of creating an organisation and having the various modules in different repos is a great idea.

apaleslimghost commented 8 years ago

I did register the org highland-js a while back...

jeromew commented 8 years ago

@caolan would that make sense for you to migrate highland to an highlandjs org ? I don't know if you have followed the work on extensibility that was done by @quarterto and @vqvu in https://github.com/caolan/highland/pull/337 but a lot of modules could exist in this framework.

svozza commented 8 years ago

So, I was having a look at this:

Test that destroyed fork does not contribute backpressure

And it looks like the destroyed fork still contributes backpressure. Not quite sure where to look to start fixing it though.

vqvu commented 8 years ago

Backpressure for forks is managed by StreamMultiplexer. New forks are created using StreamMultiplexer#newStream. Destroyed forks are removed via StreamMultiplexer#removeConsumer.

When forks want data, they call StreamMultiplexer#pull with their id (an integer assigned at creation time) and a callback. This calls StreamMultiplexer#_resume. _resume will check to see if all forks have requested data. Once this is true, it calls Stream#pull on the source and distributes the result to all forks.

Forks may be added or removed while the Stream#pull call resolves. If they are added, the result will not be distributed. Instead, it's saved in this._cached_value.

The problem here is that the multiplexer uses this._consumers[id] !== undefined to track that what streams are currently registered (link). However, the this._consumers object gets reset every time a value is emitted, which wipes away that information.

Fix in #411.

svozza commented 8 years ago

Makes sense. Thanks for the explanation!