bustle / bluestream

A collection of streams that work well with promises (through, map, reduce). Think Through2 with promises
MIT License
103 stars 5 forks source link

Node 16 Behavior Changes #44

Closed claytongentry closed 2 years ago

claytongentry commented 2 years ago

Hey @reconbot (nice to virtually meet!),

Thanks for all the great work on bluestream. I've been working to bring it up to full compat with Node 16 (https://github.com/bustle/bluestream/pull/43), and I was wondering if you could do us a solid and offer some insight into a couple of issues I've seen running on 16 specifically.

1. "Callback called multiple times" errors

The error is reproducible by running any of the transform tests on master using Node 16.

It appears that async transform functions that call a callback in Node 16 will hit a "Callback called multiple times" error due to some since-deprecated streams internals behavior specific to Node 16 (i.e. no issue in 14 nor 18). I found a helpful diagnosis of the problem here: https://github.com/kylefarris/clamscan/pull/88, where the recommended resolution was to simply return a promise chain.

Given there are no awaits in the bluestream transform handler, I simply removed the async declaration to resolve: https://github.com/bustle/bluestream/pull/43/commits/b49aee06874e23d0a66283c63549d16809d1f11a

Do you forecast any issues with that change?

2. stream.read() no longer increments the _eventsCount property

There are three tests in the readAsync suite failing on Node 16, including this one: https://github.com/bustle/bluestream/blob/7c2fe7b0cce3ba9f98c991c3ff70d57bcccce5fa/test/readAsync-test.ts#L53

The failure is on the countEvents assertion. On 16, the _eventsCount property at the conclusion of the test is 0, but the test asserts 1.

I narrowed this down to the stream.read() call here: https://github.com/bustle/bluestream/blob/master/lib/readAsync.ts#L5 On 14, I see the eventsCount property incremented, and on 16 I don't.

Do you have any insight as to that behavioral variance or any thoughts on how to resolve?

Thanks again!

reconbot commented 2 years ago

Hey @claytongentry, Happy to tell you what I remember.

The difference between a sync function returning a promise and an async function returning a promise is that the sync one can throw an exception whereas the async one can't. What Node does internally with them.. you got me. I'm not really sure when/where you call user code but you'll need to wrap it with a try/catch and return Promise.reject() in those cases so you don't get unhandled exceptions. Other than that it should be fine. 🤷

I honestly don't remember anything about counting read events.

Since node 10 all nodejs streams are async iterators and their performance has gotten on par with raw streams for most workloads. (Including workloads that I wrote bluestream for) and so as soon as I could I've moved most of my code to use for await and built out an async iterator toolchain called streaming-iterables. It's got a similar api, but it's a lot more simple under the hood and with it the problems bluestream were created to solve aren't relevant. (Async functions having harsh error cases with nodejs streams, and doing concurrent async work for ETLs.)

It's gotten some traction and I haven't looked back.

Hope that helps!

claytongentry commented 2 years ago

Extremely helpful. Thanks man. Maybe we'll just move toward streaming-iterables in place of our current bluestream usage. Appreciate the insight in either case!