Open vqvu opened 9 years ago
Ref #176.
@vqvu thanks for creating the branch. I haven't had time to zoom into it before the holiday season. That could be on the new year's resolutions ;-) happy holidays !
@vqvu how are iterative resume/redirects going? Is there a way I can be helpful? I was contemplating features/changes for 3.0.0 a bit, but @jeromew wisely suggested that I don't get carried away while the engine is still up in the air.
Added a 3.x potential
label to track related items.
@LewisJEllis I am not sure @vqvu has had time to look into this over the holiday season. I think that the changes needed to flatten the stack are still up for discovery. Last time @vqvu mentioned it he said he had no idea how convoluted this could get.
I unfortunately could not find time to look deeply at the new branch. If you feel like diving into it, it would be great and @vqvu will certainly appreciate to be able to discuss potential solutions with someone.
I was planning on working on the merge issue https://github.com/caolan/highland/pull/133. @vqvu has rewritten merge in 3.0.0 but, from my understanding, this has nothing to do with the new engine. So I would like to take his new implementation and adapt it to have it meet #133's requirements.
This way we will be able to remove this from the 3.0.0 branch.
After this, we should make sure that 3.0.0 and 2.x are in sync feature wise (I have seen a few commits that need to be checked)
If possible, we should have strictly the same tests between 2.x and 3.0.0
I haven't had time to do any work on Highland for the last few weeks...
@LewisJEllis I'm happy to discuss ideas on how iterative resume/redirect would be done. We also need some sort of a "memory leak checker" and better performance testing. Also the back-prop stuff (this is pretty useful behavior that's not at all implemented) and pipe
passing errors to Highland streams.
I modified the PR for merge on https://github.com/caolan/highland/pull/133 ; if @vqvu and @LewisJEllis you have time to review it we could remove the merge refactoring from 3.0.0
@LewisJEllis would it make sense to backport ESLint from 3.0.0 to 2.x ? I understand that caolan was ok for the change and don't see a reason to wait for 3.0.0 on this.
Yes, definitely. I originally did only 3.0.0 because I didn't want to make all the silly little changes twice, and because I wasn't sure how much would be done on 2.x in the meantime. It's now evident that 2.x needs it too, and a couple regexes did most of the work anyway, so I'll go ahead and backport it. That should also help avoid silly conflicts when 3.0.0 lands.
Regarding keeping 2.x and 3.0.0 in sync as mentioned in #189 - most things being done on 2.x in the meantime shouldn't cause any major conflicts, so I think an occasional rebase should work just fine. I'll try rebasing 3.0.0 onto master after I backport the linting stuff.
I'm not too familiar with merge
or the new engine yet, but I'll get up to speed soon so I can take a look at #133 and try to think about the iterative stuff.
ok thanks. We'll have to agree with @vqvu and the 'rebase or not rebase'. I'm all for removing
from the 3.0.0 potential conflict footprint. I can't say for sure but I think that it will take some time before we get approval from caolan on 3.0.0 so we'd better not freeze 2.x in the meantime and have an easy workflow for merging/rebasing and later have a clean merge from 3.0.0 to master.
@jeromew Why was this issue closed?
As for "rebase vs merge", I suppose it doesn't really matter too much if we don't expect non-collaborators to contribute. Only issue is that if we start getting a large-ish number of commits on 3.0, it'll start getting more annoying and error-prone to do rebases. That said, I haven't actually tried to do the rebase yet, so maybe this is a non-issue.
Agreed that 2.x should still be the branch that gets all back-compat improvements for now.
Oops sorry my mistake. I thought I was closing the issue regarding ESLint.
@vqvu maybe we could :
It is always a pain when there are 2 active branches but we did not have a choice here given the surface area of the refactoring.
do you see other things that can be backported without breaking back-compat on 2.x ? (eslint and merge is done)
throttle
, debounce
, and latest
were rewritten to purely use pull
/consume
because they depended on private methods and did weird things like override write
and pause
.
There's also a few uses of setImmediate
in the tests that were changed to _.setImmediate
to get things to work in a browser.
I think these can all be cleanly backported to 2.x.
Same thing with pipeline
, I think.
@vqvu I backported throttle
, debounce
and latest
+ added one missing _.
before setImmediate in test/tests.
I tried to backport pipeline
but the tests freeze with the new impl. I did not try to understand why it freezes this is maybe related to the 2.x engine. Do you see a reason why it would freeze ?
@vqvu, @LewisJEllis - I propose that we bump the master version to 2.3.0 and then rebase/merge/recreate the 3.0.0 branch what do you think ?
Sounds good, since most/all of the backporting is done, and 2.x needs a bump anyway.
As for whether to rebase or merge - the difference isn't very significant, but I think it comes down to how we want to land 3.0.0. If we want to have a single giant pull request to merge into master, I think rebase should keep that cleaner (although merge probably also works). If we want to eventually just swap the 3.0.0 branch in for master, merge will theoretically keep more information in the history. I've always used rebase in this sort of situation, so I can't comment as well on the merge option.
Agreed with the 2.x version bump. We can rebase this time to get rid of the backported transforms. I just didn't want to have to rebase every time we add new features to the 2.x branch.
I think I know why pipeline
doesn't work...probably some difference between the old _send
and the new _send
. It doesn't matter too much though; we don't have to backport it.
@caolan, for info, I just bumped the highland npm version to 2.3.0
thanks @jeromew, I'm hoping to get some time to look at the engine changes in more depth soon too
@vqvu do you want to try to rebase 3.0.0 or else I might try do it next week.
Yes, I will do it this weekend.
Done. Slightly painful cause of the eslint changes, but it wasn't too bad.
@vqvu thanks. I started digging into it a little trying to see if I could find some performance booster (only using bench.js)
There are 3 things that I tried at this stage:
map
. This improves the 1M benchmark. From what I understand, underscore and lodash do not have a unit-based try/catch (I mean around each f()
call). Not sure what we can/want to do about thisall tests passing
status because I did this rapidly and was not sure about the teardown but basically I removed the ConsumeStream inheritance by calling directly new Stream(function() { self.pull(pullCb) }
and defining pullCb
inside of consume
. On my setup this gives a perf boost of about 10%.it seems that in the 2.x version, the 'sync' path is even shorter which is probably why it seems to get a better bench than lodash on the 1M test.
do you have the same results ? (just to make sure that we can compare things on our respective setups)
For the try-catch problem - assuming we want to keep them - would using the trick in the Bluebird performance optimisations document help?
https://github.com/petkaantonov/bluebird/wiki/Optimization-killers#2-unsupported-syntax
Some constructs are flat out not supported in the optimizing compiler and using such syntax will make the containing function unoptimizable.
Currently not optimizable:
- ...
- Functions that contain a try-catch statement
- Functions that contain a try-finally statement
- ...
Some of these statements cannot be avoided in production code such as try-finally and try-catch. To use such statements with minimal impact, they must be isolated to a minimal function so that the main code is not affected:
var errorObject = {value: null}; function tryCatch(fn, ctx, args) { try { return fn.apply(ctx, args); } catch(e) { errorObject.value = e; return errorObject; } } var result = tryCatch(mightThrow, void 0, [1,2,3]); //Unambiguously tells whether the call threw if(result === errorObject) { var error = errorObject.value; } else { //result is the returned value }
Depending on the performance improvement it could be overkill.
@svozza the map
loop in the 1M case indeed seem to be helped by extruding a function like
function tryCatch(fn, x) {
try {
return fn(x);
}
catch(e) {
return new StreamError(e);
}
}
and then doing in the loop
var fnRes;
fnRes = tryCatch(f, x);
if (_._isStreamError(fnRes)) {
push(fnRes);
} else {
push(null, fnRes);
}
with this modification + the one on inlining ConsumeStream, the 1M bench.js is on par between lodash and highland on my machine.
the try/catch inside the highland loop has a penalty.
@vqvu do you have a test somewhere regarding the stack exhaustion problem ?
@jeromew There's some test code in the original PR for recursive next
calls.
Recursive consumes look something like
var s = _([1]);
for (var i = 0; i < 100000; i++) {
s = s.map(function (i) { return i; });
}
s.toArray(_.log);
try-catch
We do want to keep that. This was added intentionally (see #97, #94, and related issues). underscore and lodash don't do error checking because they're synchronous. Since highland is async, we don't want errors just flying off into nowhere.
perf
I don't really see a benefit to the changes. The code I'm using is at https://github.com/vqvu/highland/tree/3.0.0-perf-test.
Here are my results (node v0.10.18).
# 3.0.0-perf-test
.map(square) x 1,000,000
underscore #################################################### 122ms
lodash ################################## 81ms
highland ############################################################ 143ms
# 3.0.0
.map(square) x 1,000,000
underscore ######################################################### 131ms
lodash #################################### 83ms
highland ############################################################ 139ms
That said, I think the move to a specific tryCatch
function is probably safe enough to make...the question is whether the extra function call is worth it. It's not like the the consume
callback for map
is all that large.
Ok, I take back my comment on not seeing a benefit to inlining ConsumeStream
. I saw some speedup when I modified bench.js
to add more map
to the chain.
I added some more aggressive inlining of the sync consume
code path in my perf test branch.
It inlines all recursive sync calls to _runGenerator
in _next_fn
and the call to the generator in _runGenerator
.
.map(square) x 1,000,000
underscore ############################################################ 141ms
lodash #################################### 84ms
highland ########################################## 98ms
I cloned your branch and get
.map(square) x 1,000,000
underscore ############################################################ 274ms
lodash #################################### 162ms
highland ############################ 126ms
the joy of benchmarks. inlining + tryCatch seem to improve things a little.
I have no idea what to do about the bootstrapping of highland at this time.
.map(square) x 10,000
underscore ######### 1ms
lodash ######### 1ms
highland ############################################################ 7ms
I am not sure that 2.x does any better but for frequent highland usage on moderate arrays, this can have a big impact. lodash reaches this probably by using the native array.map. I am not sure though how we could 'bridge' the sync/async gap by using array.map when possible.
Anyway we should first see if we want/can flatten the call tree before optimimizing the pull/consume primitives.
On a side note I saw that lodash 3.0.0 has landed and has lazy support - http://filimanjaro.com/blog/2014/introducing-lazy-evaluation/. Maybe there could be a way for the highland engine to defer to lodash when in sync mode. That's just a thought but if the core of highland is really the pull/consume engine that bridges the gap between sync/async, it could be interesting to benefit from the work on lodash for sync optimization..but that's just a tought ;-)
I'll try to understand how the recursive consume could be unrolled so we can discuss this.
Very, very dirty prototype of unlimited consume
: https://github.com/vqvu/highland/tree/3.0.0-rec-fix. It's really ugly and will be fairly difficult to maintain.
Roughly speaking, the recursive consume
looks something like this.
resume -> rungen -> pull -> ... -> rungen -> _send -> cb -> push -> _writeOutgoing -> _send -> cb -> ...
This implementation splits that chain into two parts
# Loop is in _runGenerator w/ some coordination from resume. About 5-10% slow down.
resume -> rungen -> pull -> resume -> rungen -> pull
and
# Loop is purely in _send. About 20% slow down.
_send -> cb -> push -> _writeOutgoing -> _send -> cb -> push -> _writeOutgoing -> _send
The second loop is by far the more expensive one, because we have to work around the fact that we have no control over cb.
Unfortunately, the current implementation does come at a roughly 30% cost in the sync benchmark. Still pretty good though; we won't get this for free.
.map(square) x 1,000,000
underscore ############################################################ 130ms
lodash ###################################### 82ms
highland ############################################################ 130ms
I won't have much time for the rest of the week, but I'll try to take a look at the bluebird wiki. It seems quite useful.
Added infinite next(xs)
support + some perf improvement.
.map(square) x 1,000,000
underscore ############################################################ 144ms
lodash ################################### 82ms
highland ################################################# 116ms
@vqvu I fear about waking up this issue because I am not sure how to make this converge with all the parties involved (and mainly @caolan). On the other end, letting this issue unattended feels bad for the evolution of the library if the initial architecture refactoring is a must for solving redirection bugs.
You showed that it is possible to reach the 'flat recursion' objective but I agree that it might be harder to maintain.
Do you think the 'flat recursion' would be a premature optimization or do you feel like the core has enough stability to warrant such an optimization ? Keep in mind a 4.0.0 target with an official extension mechanism so that the main library would only contain the basic building blocks and we could externalize many of the custom transforms that tend to bloat the API.
I think that we need a 3.0.0 release as soon as possible to fix some problems with the library that has popped up. There's backpropagation of nil issue. And the core refactor is needed to fix performance issues with pull
(see #270) and redirection bugs. We may also want to do some renaming to help out with ramda integration. If "flat recursion" would hold that up, then we don't really need it. There doesn't seem to be many reported issues stemming from this, so I think it's low priority.
I think the important thing is to get 3.0.0 released with the appropriate semantics changes to support the things we want to do. Once we have that, we can easily do everything else as improvements on top.
@caolan I think the ball right now is firmly in your court. If you don't have the time to review the current state of the core engine refactor, please let us know so we can proceed accordingly.
Note that we don't necessarily have to do all of the convoluted things that I did to fix the stack overflow bug. We could just do setImmediate
everywhere. It's just not ideal for synchronous sources, though there's an argument here that synchronous sources don't matter much since Highland is a library for managing asynchrony.
@vqvu your work on this has been excellent, much appreciated! As for the performance improvements, Highland already has lots of complicated performance-related code in 2.x already so I don't see a huge problem in introducing them.
I'm a tentative +1 (since I've not had time to try this out on some real projects yet), but the code looks good and I trust your (and the other collaborators) judgement regarding a release. I'm interested in the ramda integration plans, since that could give us an complimentary tool for sync use cases where performance is critical.
@caolan I want to add that @vqvu is probably the one most knowledgable on the codebase after you. Maybe you would want to consider giving him npm rights (in parallel or instead of me, this is your call depending on how you see things).
You gave me these privileges when I acted as a moderator when @vqvu proposed this huge engine refactoring but I think that I maybe restrain him a little too much on the project wrt the improvements he as made so far (+ I was afk for several weeks and he could not make a release without me).
I've added @vqvu as an owner on npm too. You're both welcome to make releases. Please make sure any major or backwards incompatible changes are discussed in an issue first :)
Some updates, thanks to @quarterto for giving me a push on this via #324.
I merged in the latest changes from master and added these features
onDestroy
. See #172.take
usecase). Also to support #172.There's some additional things to do before a proper releas, so I will add some todo tasks in the original task so we can track this stuff.
Ok, I've updated the original post. Anything else that we want to change that we want to change that I've missed.
I know I briefly mentioned in the ramda integration issue that we might want to rename/reorder args for some transforms to make ramda integration easier. Is it worth it to do? Or should we just provide a ramda-integration
package that people can use
to override these methods? The other alternative is to provide a highland-2
package that people use
to revert to the old transforms. Maybe the latter is better, so new users can get the nicer API while existing users can upgrade incrementally...
See https://github.com/ramda/ramda/issues/1037#issuecomment-94734633 for a detailed description of the differences. The more eggregious ones are
reduce
/scan
has the iterator
and memo
arguments reversed.pickBy
's callback is f(key, value)
instead of f(value, key)
.zipAll0
and zipAll
should really be zipAll
and zipEach
, respectively.Anyway, feedback needed.
I'm a +1 on changing those three functions but I don't think we need to go any further than that.
I really like the idea of a highland-2
package too.
@caolan
caolan wrote: I'm interested in the ramda integration plans, since that could give us an complimentary tool for sync use cases where performance is critical.
For me it seems like Ramda isn't really about perf critical scenarios opting to prioritize for functional philosophy instead (which isn't a bad thing). This can be seen in core design decisions like Ramda performing more costly deep value comparisons for equality checks instead of simpler ===
checks. This compounds for mobile where a recent jsperf benchmark (which is down atm) on my Windows 8.1 phone took Ramda ~14 seconds to complete a single operation on an array of 1,000 elements compared to ~1 millisecond for a library like Lodash.
It's something to keep in mind at least.
FWIW I updated bench/bench.js (which is still a little micro-benchmarky) to use the latest edge lodash (will be v4), corrected its Underscore usage and updated its version (I've sent a PR), and enabled a shortcut fusion benchmark:
.map(square) x 10,000
underscore ########## 1ms
lodash ########## 1ms
highland ############################################################ 6ms
.map(square) x 100,000
underscore ########## 3ms
lodash ####### 2ms
highland ############################################################ 19ms
.map(square) x 1,000,000
underscore ######## 14ms
lodash ############ 22ms
highland ############################################################ 110ms
.map(square).filter(isEven).take(3) x 1,000,000
underscore ############################################################ 39ms
lodash ## 1ms
highland ########### 7ms
Thanks @jdalton - I'm continually impressed with lodash :)
To add a little, those === checks are exactly what you want using immutable data structures, which pair very well with functional programming. :)
This is going to sound silly but I want to make the changes to reorder the args in reduce
and rename the zip
functions but I don't know how to checkout the 3.0.0
locally and raise a pull request against it.
You probably don't have the 3.0.0 branch in your fork. Do
$ git remote add caolan git@github.com:caolan/highland.git
$ git fetch caolan
$ git checkout -b 3.0.0 caolan/3.0.0
$ git push -u origin 3.0.0
This will push the 3.0.0 branch to your fork and set it as the upstream for your local 3.0.0 branch. You can then fork a feature branch against like normal. You should be able to set a base branch when creating your PR.
Nice one, I had to do git push -f origin 3.0.0
to get my fork's branch up to date but it's all up and running now.
@LewisJEllis Do you mind if I port your changes from #166 to the 3.0.0 branch?
I did that in #191 but I think something about #240 made it no longer needed after some other changes. I'm not too fresh on the details though, so feel free to take another look. If it does seem necessary, maybe we can just merge #191?
Well, I think the reason we didn't do this in 2.x was because it would be a breaking change but I think we still want to do it. #240 was related because through
had the same behaviour as pipe
even though it works exclusively with Highland streams so we considered its behaviour a bug.
Right, okay. I do recall the decision not to merge it on 2.x, but I'm otherwise not too sure what the situation is. I left a comment on #191.
This issue tracks any changes we want for v3.0.0. See the 3.0.0 branch.
The biggest change for now is a reimplementation of the Highland engine. See the original PR at #175.
Breaking changes
fork
afterconsume
. Must callfork
at the start.consume
a stream twice.stream.source
no longer exists.pipe
passes along errors if piping to a Highland stream instead of callingthis.emit('error')
(#166).See https://github.com/caolan/highland/issues/360#issue-99747182wrapCallback
and stream constructor passes an array for the default mapping hint when there are more than one argument is passed to the callback (#247, #335).zipAll
renamed tozipEach
.zipAll0
renamed tozipAll
.reduce
andscan
argument order reversedmap
will now throw an error if the argument is not a function. The old behavior instead mapped all stream elements to the argument if it was not a function.Before release
onDestroy
(see #172)consumes
without a fork.consume
-based should get this for free, but not all transforms useconsume
. Maybe we should add acreateDownstream
method that does this for us.noValueOnError
to test the backpropagation behavior.Makepipe
pass along errors to a Highland stream (#166).pipe
docs pointing tothrough
for the case when users want to pipe along errors as well as values to another Highland stream (https://github.com/caolan/highland/issues/179#issuecomment-125669798).ConsumeStream
andPipelineWrapperStream
(to make it easier to port #337).ConsumerStream
PipelineWrapperStream
use
(see #337).noValueOnError
to test that core transforms always return an object of the appropriate extended type.use
.Fix default mapping hint behavior (#335).Remove the note in the docs that say we are changing the default behavior.reduce
andscan
zipAll
andzipAll0
pickBy
map
throw an error if not given a function (#404).highland-2
module for reverting the transform rename/reargs (https://github.com/caolan/highland/issues/179#issuecomment-116208712)index.js
intohighland/core
andhighland/transforms
Nice to have but can be delayed until after release
consume
and redirect.