Open bruth opened 7 years ago
Those are some interesting points.
For the first one, Repository.Save
is intended to only take events from a single aggregate in increasing version order. I'll clarify the docs and the verification so the intent is clearly. With that guarantee, it's possible for dynamodb, mysql, and postgres to offer the same guarantees of atomic-ness.
Hmm, quite honestly we had never consider a loading all the events in the repo up to a point in time. Our thinking has been to treat each aggregate independently.
Could you describe some use cases where you'd want to rebuild the entire repo up to a certain point in time rather than just a specific aggregate?
I do like the idea of rebuilding the aggregate as of a given time.
The use case I had in mind was actually to support alternate event timelines, i.e. branches that (in my case) would get merged/committed back into some main repo I declare. So a very basic version of the git model, but for structured data. The only caveat is that the events in my case would likely need to be the commands or intents in order for conflicts to be detected, but that is a separate issue.
One strategy could be to maintain separate repos, say, for each branch and manage the merging process. If the repo supported querying based on time or a transaction id (i.e. each set of events that were committed), then the branches would simply act as extensions to the root repo. State could be rebuilt by reading the main repo up to some time T and then whatever events are in the branched repo would finalize building the current state.
@bruth it may sound crazy, but why not 'just' use git? If you are doing branches and merges it seems to fit the bill better.
@delaneyj I have thought about that and agree that would ideal if it could work. I guess my concern is that I want to handle the semantics of merging and dealing with conflicts if they arise. I presume if there is a conflict I could just read the bytes and present the conflict in a structured way.
To get more concrete... I am modeling computational research workflows where (think machine learning or statistical pipelines) which may involve a team of people. A workflow is often planned at a high level up front, but evolves over time. During that process team members may work on and change certain parts of the workflow as they learn more about it. Ideally the changes made are captured and can be discretely viewed by others working on the project.
Since this is research, it is quite common to try various things (branches) only to discover one that may be good or applicable to the research goal. In practice there is a low chance that conflicts will actually emerge since most research teams are small and work on separate parts, however the ability to have separate, temporary lineages of a project is ideal for this type of work.
There is a screen shot here showing a clip of a workflow: https://rdm.academy/
Hm. Well I believe I was thinking about aggregates incorrectly. The workflow should be the aggregate that ultimately references other entities internally (but that is an implementation detail).
I opened up a PR #10 just to get the time idea across. It adds LoadVersion
and LoadTime
methods to Repository
. The one problem with the LoadTime
implementation is that it requires loading all events for the aggregate and checking if the max event time has been reached. This could be optimized by having a lower level method that limits events by time.
To make time first class for querying, the Record
type would have to have a time field so the querying could be pushed down to the underlying store.
This is mostly a question whether it is in scope and/or if you have thought about this use case.
The API supports loading an aggregate at a particular version of itself, but If I wanted to get the state of two aggregates at some point in time, my understanding is that there is no way of doing this since the version is local to the aggregate. Since
Repository.Save
takes multiple events which may span multiple aggregates, they are being transactionally saved together (at least conceptually) and thus could/should represent an atomic change in state in the repo.If there was a repo-level transaction id generated (monotonically increasing) on each call and added to each record, then an aggregate can be loaded relative to the transaction id which means the state of all aggregates in the repo could be loaded with respect to some point in time. At that point you could get a copy of the repo "as of" some transaction (or time).
A timestamp could be associated with each transaction id so
Asof
could be a realtime.Time
rather than the transaction id orAsofT
for txid andAsof
for time.