ASOS / SimpleEventStore

SimpleEventStore
MIT License
81 stars 24 forks source link

Added capability for consumers to create unit tests for TTL scenarios #27

Closed DaveAurionix closed 6 years ago

DaveAurionix commented 6 years ago

Added capability for consumers to create unit tests for TTL scenarios.

To achieve that goal, this PR adds support in the InMemoryStorageEngine for setting up an event stream where a per-event TTL concept may have mutated the stream in an underlying storage engine.

Once a "corrupted" (mutated) stream has been set up in storage then further calls to append events normally will always cause the concurrency check to fail due to the check relying on the count of events in the stream rather than the event number of the last event in the stream. This has deliberately not been altered in this PR as the only identified use case is to set up a stream with missing initial events to allow consuming code to be testable when an incomplete stream is returned from the store. For safety, attempting to write back to that mutated stream fails by design.

asosMikeGore commented 6 years ago

While this change is just around the InMemoryStorageEngine, are you getting problems with the DocumentDbStorageEngine too?

A TTL shouldn't impact the concurrency check for the DocumentDbStorageEngine. If I've understood things correctly you are just trying to setup a stream where the initial revision != 0 for testing.

We have to be careful with the event number, even with turning off optimistic concurrency checking, the sequence still needs to hold true, even with an in-memory implementation.

If the library is ensuring correct behaviour, what types of tests are you trying to implement on partial streams? Your code should be able to be pretty agnostic to the fact that some events have been dropped off the stream due to the TTL kicking in.

DaveAurionix commented 6 years ago

We're only testing TTL scenarios at unit-test level where the InMemoryStorageEngine is used. We're not testing TTL scenarios at service-test level where the DocumentDbStorageEngine is used. So we only have this need against the InMemoryStorageEngine (others' needs will differ I guess?).

Your understanding is correct - we're setting up a stream where the initial revision != 0. I agree with the need to be careful with the event number, hence a long "don't use this" implication in the enumeration value that disables the concurrency check in this PR, and hence why the IStorageEngine interface is left alone - it's not behaviour that we would currently want anywhere other than when setting up test scenarios, so in no storage engine apart from the InMemory one.

It's true that the application code should be able to handle events missing from the beginning of the stream and "know" state due to the presence of later events. However, it increases the complexity of analysis, understanding and unit test cases if we have to create unit tests to cover all of the partial stream scenarios in application code. To reduce the learning curve for teams as we write processors for the event streams, our interpretation of the TTL is simply that if the first event in the stream is missing, we treat the entire stream as having expired. We implement that in one shared event sourcing library, and it is a simple test case. Application code can "opt-out" of that default behaviour later if it needs to, but the default behaviour of the library is simple and doesn't create a requirement for more unit tests.

It's the creation of that one unit test in the shared library to treat a missing first event as if the entire stream was missing that revealed this need. Ideally from our perspective this would be behaviour in the storage engine (i.e. stream TTL not event TTL) but that discussion has been held previously and we settled on event TTL for SimpleEventStore.

Thinking about that last paragraph, I've just realised that the only reason we care about the event number is to detect a truncated stream. That's something that the shared event sourcing library needs to understand to provide a TTL on the stream (not each event). Application code doesn't care about event number (which is where I guess you're coming from). As such, it might be better if we bite the bullet and use a mock IStorageEngine for this one test case and abandon this PR?

asosMikeGore commented 6 years ago

If we did stream specific TTLs, it would be more metadata we associate when the events are written rather than based on the event number. It's more we just need to ensure that TTLs never drop events in the middle of a stream, it should be from beginning to end.

What I'm wondering here is that if what is more relevant for your use case is supporting stream deletion, and/or a stream centric TTL so all events delete at the same time. The only downside to the latter is that you'd need to write code post the initial write to delete streams.

Benefit of either approach is that an empty stream is an empty stream rather than having the code make a decision based on at least one event having been removed from the stream.

Finally, there is nothing wrong with having different concurrency levels. So if you need that let me know and we can get this added in to both the InMemory & DocumentDB engines.

DaveAurionix commented 6 years ago

Given that we have event-based TTL at the moment, a stream-specific TTL could just be implemented in the CosmosDB storage engine by not returning any requested stream where the first event's number was > 1. Under the hood, the actual stream would slowly get deleted from storage by CosmosDB as the individual events expire. You seem reluctant to like that idea, what am I missing, is it just that it's not neat if we look at the raw data inside CosmosDB? Seems minimal-code to me? We don't need to track the concept of streams, to write more metadata, or to have code somewhere be responsible for realising it's time to delete a stream? A DeleteStream method might one day be needed, but possibly not here.

That would remove the need for us to do this somewhere between SES and our event sourcing library.

I don't think a TTL will ever delete events from the middle of a stream because the first event will always be in the first commit so will always be the earliest document written.

Other storage engines might implement this differently, but it would be their internal concern.

What would you prefer? a) We merge this PR b) We abandon this PR and we will support our unit test case using a mock of IStorageEngine? c) I can work on a new PR adding support for per-stream TTL in the Cosmos engine, based on presence/absence of event number 1 - so all events appear to delete at the same time to consumers of the engine? d) You/I could work on a PR to add support for a stream-based TTL using one of the two options you outline above?

I think my order of preference is: c, b, a, d; but that's because I think d will leave us with a need in (application?) code to decide when to delete a stream and to explicitly call a method to do so, and will increase the complexity of SES code when a one-line if(firstEventNumber>1) might suffice (subject to whatever concern with that you raise).

Your preference?

DaveAurionix commented 6 years ago

Abandoning this PR as we're pushing forward with (b) above for now. Will revisit after discussion around stream TTL has concluded.