TurtleAI / derive

An Event Sourcing and CQRS solution.
0 stars 1 forks source link

Batch processing (sort of) #21

Open rwillians opened 1 year ago

rwillians commented 1 year ago

Most services (and perhaps some reducers, when they processes the same event "instance") requires awaiting for an event to get reduced prior to do what they need to do. In some cases, we need to await for all events that were generated by a command, not just the one event we're handling.

Therefore, we need a simple API we can call from within the service's/reducer's event-handler so that we can await the entire "batch" of events related to the one we have at-hand -- wait 'till they get handled and their changes get persisted.

At first, it could be something "simpler" (or less difficult, at least) like awaiting all events that have the key command_id with value X. But, while planing a future infra/arch need, I realized that in some cases we want to persist events directly to the event store, skipping commands. For example, when we persist events from within migrations.

I really don't want to clutter the command_id kv with trash data -- that can't be correlated to a command log nor a request log --, so I suggest we add a key to events named something like batch_id. For commands, we can make them populate this batch_id with the same value as the command_id, no problem. Or we could just generate a new id and set that value to batch_id of all events produced by that command execution. Either way would do.

Since Derive seems to be non-opinionated in regards of the structure of events, with the exception of the id field, this new await_processed/2 signature might need to accept a tuple informing both the key and the value the user (dev) wants to match.

Current API:

await_processed(list_of_events, :reducers)

Suggestion:

await_processed(list_of_events, :reducers)
await_processed({:batch_id, "123"}, :reducers)
# clear distinction between signatures -- pattern matching by list or tuple.

It's likely that derive will need some sort of registry/index of all events currently being processed and which pid/supervisor (might be many) is processing them. If that's the case, it should also be possible -- and I'd say expected -- to have the following signature as well:

await_processed(list_of_event_ids, :reducers)
# You'd have to peak into the type of the first element of the list in order to make a distinction
# between this signature vs the the one that takes a list of event structs.
# I won't suggest removing the signature that takes the list of events' structs in favor of this
# one because I guess we get performance benefits from awaiting on a struct -- I imagine it's
# easier for you to locate the processes you need to await based on the workers that handle
# that event struct. But, if the performance gain of that approach is negligible, then perhaps
# it'd be better to just replace it in favor of list of ids.
venkatd commented 1 year ago

Maybe we can discuss next week what options might help your use case. My main concern is that this would add quite a bit of complexity to Derive (at least with the designs that come to my mind). The main complexity IMO is the book-keeping of the properties to be awaited.

I wonder if there are ways to to accomplish this that doesn't affect the core design of Derive. If I understand, there's a single place in the code where you need this related to time tracking? You want a