KoteiIto / node-athena

a nodejs simple aws athena client
MIT License
105 stars 73 forks source link

Backpressure when streaming athena results #60

Open juicetin opened 3 years ago

juicetin commented 3 years ago

Hi,

Thanks for putting this library together, as the recommended library on AWS' own docs (athena-express) has no built in support for streaming results.

One issue I'm running into is that when streaming results, the operations I'm performing on them are significantly slower than the rate at which the data is being streamed in. The data then buffers until I've exceeded the maximum number of allowed memory mapped locations (at an operating system level). Is there a way using this library at the moment to be able to restrict how quickly the data is being streamed in, based on the number of data events that have been successfully processed?

juicetin commented 3 years ago

The highWaterMark setting in nodejs streams would solve this issue but the underlying AWS SDK does not appear to provide the ability to set this property.

In absence of that I've instead implemented my own throttling using the handler in the data event. I'm checking the number of data items I've 'queued' downstream (total passed in subtract the number successfully processed), and when that queued number passes a certain point, call stream.pause(). Once the queued count drops below the threshold, call stream.resume(). This seems to have solved the problem for me.

Because this functionality/setting is missing in AWS' underlying library, feel free to close the issue if this library is the wrong level of abstraction to deal with this problem