Fishrock123 / bob

🚰 binary data "streams+" via data producers, data consumers, and pull flow.
MIT License
79 stars 8 forks source link

What are the differences of this approach in comparison to pull-stream? #15

Open JDvorak opened 6 years ago

JDvorak commented 6 years ago

I'm excited to see this development as I am a heavy user of the pull-stream ecosystem for etl processing. This approach feels and reads extremely similar, but with obvious gains to be made by making it natively supported by node. Do these two efforts align (or differ) in any way? Is bob expected to support existing pull-stream patterns so as to benefit the variety of libraries already available on npm? Could it? 😄

For reference: https://github.com/pull-stream/pull-stream

dominictarr commented 6 years ago

@JDvorak thanks for bringing this to my attention (via https://github.com/pull-stream/pull-stream/issues/116)

I think reading the docs here, this is actually more similar to push-stream if you swap the bob.pull() method for push_stream.resume(). One difference between push-stream and bob looks like push-stream also has a paused property that sources check before writing. This means a sink only needs to call resume when it's stopped being paused, and that sources can call write in a loop, which is really fast.

The big mistake in pull-stream was wrapping an async operation into the api. callbacks in the case of pull-stream, but promises would also be bad. That adds quite a bit of overhead, and often not actually used.

I'm still writing most things in pull-streams because there is a great ecosystem, but when I want something to be high performance I try to use push-stream.

Also note: push-stream is nearly exactly the same as http://www.reactive-streams.org/ (trigger warning: java)

If you have two good streaming interfaces, it's not very difficult to write an adapter. Just make sure this has back pressure and propagates errors/abort - and it will be easy to interface with pull-stream

Fishrock123 commented 5 years ago

Ok so unfortunately this ended way back on my list fo things to do, but here's a summary of some differences (in no particular order) (to my knowledge).

Just to preface - I'd known in fair detail about Dominic's efforts with pull-stream long before I started on this.

Goals

This should be obvious by the resulting code (goals are listed in respective repos, I think)

Calling context of code

In pull-stream you pass a "raw" function to be called from a separate context, whereas for bob objects ("classes") are linked and then call respective methods on one another.

I have found the bob approach more practical for writing endpoints which require state management (most do), although it certainly looses out on some nicety of the functional-ness of pull-stream.

Setup / flow helper

I personally find pull-stream's pull(...) helper quite difficult to understand / follow internally.

In part due to this, and the move to not take a purely functional approach, bob's Stream(...) helper is much simpler, and the whole flow of a connected set of stream components is quite easy to diagram (even with error handling).

Error handling

The bob api has upwards error propagation 'built in', to ensure everything can as much as possible be shut down and cleaned up. (Which also makes error flow very straightforward.)

As far as I am aware pull-stream doesn't do that the same.

Consumer buffer allocation

To reduce large memory copies as much as possible, bob allows consumers (sinks) to send a buffer with a pull request which can be written into if the source supports such a thing. This is significant part of the design intended to increase performance compared to Streams3.

As far as I am aware, this isn't possible in pull-stream.

C/++ portability and adaptability

The API for bob is also intended to be easily portable to C/++.

C is a bit of an issue for both, but I think it would still be easier to model bob with (although I may be incorrect there).


There are probably other things but these stand out the most to me! Hopefully ya'll still see this. 😄

dominictarr commented 4 years ago

yes, this is a lot like pull-streams except for OO instead of closures. That's really a small detail. before I said this was more like push-streams, but looking again I realize that was incorrect, because next is called exactly once for each call of pull