feat: `ShapeStream` backpressure from subscriber processing callbacks

msfstef commented 2 months ago

As discussed with @kevin-dp, the ShapeStream currently runs ahead to the end of the stream regardless of what its subscribers do with the messages.

This means that a stream will always accumulate all the messages from the network in memory regardless of the speed of processing of its subscribers, with no options for the developers to create backpressure and limit the rate at which the stream is read. This also leads to the side effect of the ShapeStream reaching an isUpToDate state that is misleading, since there is no guarantee that its subscribers will also be up to date.

The proposed solution is to get rid of all queuing mechanisms (less code, yay!) and make the ShapeStream wait for all subscribers to process the message batch before moving forward.

This gives the option to the developer to regulate backpressure - they can still let the stream run ahead as it does now by providing a synchronous callback that schedules the message processing e.g. in a queue that they maintain. But they can also put a subscriber in front that asynchronously stores data to disk, and thus the stream can run a little slower to avoid memory buildup.

Basically via this API we allow for developers to flexibly decide how fast the stream will try to retrieve data from the network in a relatively intuitive way.

Of course this should be more clearly documented both in the code and the docs, but I want to have people's opinion on whether this is something we want or note before doing so.

netlify[bot] commented 2 months ago

Deploy Preview for electric-next ready!

Name	Link
Latest commit	a2b1fe61aff905b8cd0053d81e53c31a5674c3ef
Latest deploy log	https://app.netlify.com/sites/electric-next/deploys/66e97387eb94b7000863cd91
Deploy Preview	https://deploy-preview-1731--electric-next.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

msfstef commented 2 months ago

@KyleAMathews the idea is that you can moderate the back pressure any way you want. For example, the developer can have a processing queue, and provide a processing callback that just adds the messages to their own queue. That means the stream will continue to load from the network without actually waiting for the messages to be processed (as the message processing is really just scheduling them for processing in another system).

For example, the developer might provide a processing callback that schedules the messages for processing and monitors memory usage, and when a threshold is passed they can do something like waitForProcessing within the processing callback to "pause" the stream.

KyleAMathews commented 2 months ago

Sure they can maintain their own buffer. But my point is that it's also a good idea for our client to maintain a buffer as well of one fetched batch of messages.

msfstef commented 2 months ago

Aaah yes indeed - essentially what we can do is have a very thin network precache mechanism, such that the stream operates exactly as it does now but the fetch will have loaded the next batch already.

In fact I think this might fit nicely with another fetch wrapper, like the one we have for backoff retries, that is custom to electric and the rest of the code is completely oblivious to this buffer.

KyleAMathews commented 2 months ago

sounds good 👍

I'm not suggesting by any means we do another refactor right now but something to look at and ponder is if we should pull in https://effect.website/ at some point — managing all these flows and error states, etc. might get easier.

electric-sql / electric