Open KyleAMathews opened 3 months ago
I would argue that this functionality should be a layer above ShapeStream. ShapeStream is a nice thin abstraction of the protocol, and perfect for feeding into other stores. If that store has persistence, ShapeStream doesn't need any itself.
Maybe a PersistedShapeStream? It should be composable with a store implementation you could pass it (IndexedDB, OPFS, Node FS)?
Ooo yes! I've been uneasy about throwing this into ShapeStream and persistence is a natural layer to compose above it.
const streamWithPersistence = new ShapeStreamOPFS()
const shape = new Shape(streamWithPersistence)
We've also talked about not actually storing the log, but just the last value for the keys, since the developer shouldn't build on historical events, since you can never have the guarantees that you get a full history.
I can pick this up - my suggestion is a composable ShapeStreamPersister
that accepts a ShapeStream
and another instance with a specified Storage
interface (perhaps set
, get
, delete
? something sensible), so either we or anyone else can add any storage option (we can start with indexeddb or local storage)
I also agree with Valter, which touches on the compaction mentioned by Kyle above - I think what we should do is materialize the log as we ingest it while also keeping track of the last offset seen, and when it's time to restore the stream from DB the materialized data is converted into a series of inserts (like we do for snapshots in the backend).
Issues regarding FK checks, check constraints etc apply for the compacted log coming from the backend anyway so doing this in the local store should not be a separate issue.
Sounds like a great plan @msfstef!
The Typescript
ShapeStream
class is responsible for reading the Shape Log from the Electric server and feeding it to other code that'd like to do something with the stream e.g. theShape
class which creates an in-memory materialized view of the Shape as a JS Map.Shape Logs are cached in Electric (and in an http cache proxy (e.g. nginx/cdn/etc) if you're using one) so they're generally cheap to load. But still, if shapes are larger than a few mbs, persisting shape logs to disk would speed up loading — especially for people on limited mobile networks.
The implementation for this would be:
On Initializing:
ShapeStream
checks if there's a persisted log. If so it loads that, theoffset
, andshapeId
for the log.On receiving new shape log messages:
ShapeStream
would emit these to subscribers and then append them to the persisted shape log.Compaction — Shape Logs fill up eventually with a lot of redundant information (e.g. 100s of updates to a single row). So periodically compacting the log is necessary to keep reading the log fast. Compaction is basically reading through the log from the start and merging together repeated operations on a single row. When a row is deleted, we can remove all log messages. When compaction is finished, the log is left with one
insert
operation / row in the shape.Considerations
ShapeStream
wait for aup-to-date
control message to let them know that they're caught up with the server and can now inform their subscribers (e.g. a UI component). When reading the persisted log offline, the ShapeStream needs probably to be told it's offline so that it can emitup-to-date
when it gets to the end of the persisted log.