Dzoukr / CosmoStore

F# Event store for Azure Cosmos DB, Table Storage, Postgres, LiteDB & ServiceStack
MIT License
168 stars 21 forks source link

Version 3 discussion #27

Closed Dzoukr closed 4 years ago

Dzoukr commented 5 years ago

This issue is created to track discussion about upcoming version 3 of CosmoStore and what should be its main goal. For me it is mainly about making EventStore "JSON payload representation agnostic" (in other words, do not force to use JToken) since Microsoft itself annonced new JSON API and moving away from Newtonsoft.Json coupling it its libraries. It will also allow to add other stores like LiteDB without unnecessary transformation to JToken and back.

Technically it should be quite easy to make EventWrite, EventRead and EventStore generic like:

type EventWrite<'a> = {
    Id : Guid
    CorrelationId : Guid option
    CausationId : Guid option
    Name : string
    Data : 'a
    Metadata : 'a option
}

Also upgrading existing libraries should be straightforward, so version 3 could be done pretty quickly.

Any other ideas? cc: @kunjee17 @ctaggart @TheAngryByrd @viktorvan

Odonno commented 5 years ago

@Dzoukr Should it be bound to .NET Core 3? Not sure if .NET Standard 3 is a thing.

kunjee17 commented 5 years ago

Need and interface to implement json serialization. With default implementation provided by us.

Also, we need some common mechanism in which all other implementation follows. Common functions etc. So, things are stay in line.

Dzoukr commented 5 years ago

Should it be bound to .NET Core 3? Not sure if .NET Standard 3 is a thing.

Not sure neither :) Probably staying with Standard 2.0 is good idea.

Need and interface to implement json serialization. With default implementation provided by us.

I don't think we need that on CosmoStore level (just API). Isn't it something that should be covered on storage implementation level? Of course each library needs to provide info what payload type 'a it supports.

kunjee17 commented 5 years ago

@dzoukr it is basically question of choice. If it is at implementation level then it will tie that implementation to json serialization. Nothing wrong in it. As in most case it is OK. But if someone like to play n plug thing then it would be difficult.

That's why I told can have interface and default implementation for each store. Now, lite db don't allow replacing to that option will stay as it is. But Marten allows it. So, that can be updated.

Simple dumb way would be 'a is string always. Because events will be serializable always.

Again , these are suggestions. I can't foresee all the options with possible solutions as of now. But whatever flexibility we can put it would be great.

bartelink commented 5 years ago

The interfaces and design the TypeShape UnionContractEncoder uses works well IMO as a way of doing serializer agnostic work. (See Equinox.Codec)

On the Cosmos side, it's also worth considering using the Azure.Cosmos [v3] SDK at some point given its pretty close to becoming the main/only properly maintained client lib.

Dzoukr commented 5 years ago

Thank you all for ideas and suggestions. I'll create new branch for version 3 and see where it goes. 😁

TheAngryByrd commented 5 years ago

For me it is mainly about making EventStore "JSON payload representation agnostic" (in other words, do not force to use JToken) since Microsoft itself annonced new JSON API and moving away from Newtonsoft.Json coupling it its libraries

Overall I do like the idea of making it generic. Allows the implementor to use whatever JSON library fits best.

Simple dumb way would be 'a is string always. Because events will be serializable always.

The only issue with this is if people save giant documents you could end up putting lots in the Large Object Heap quite easily, cause fragmentation, and users would need to set LargeObjectHeapCompactionMode in some regular timer.

    let setupLOHCollectionTimer () =
      let timeDelay = TimeSpan.FromMinutes 20.
      let timer = new Timers.Timer(timeDelay.TotalMilliseconds, AutoReset = true, Enabled = true)
      timer.Elapsed.Add(fun _ ->
        Runtime.GCSettings.LargeObjectHeapCompactionMode <- Runtime.GCLargeObjectHeapCompactionMode.CompactOnce
        GC.Collect()
      )
      timer
TheAngryByrd commented 5 years ago

While we're discussing breaking changes, I've been playing with Redis Streams and was thinking of also making an implementation for CosmoStore. The one issue I ran into was Position is tied to an int64. Redis Streams uses a composite key <millisecondsTime>-<sequenceNumber> which has it's own semantics when it comes to incrementing. So in a similar fashion we might want to generic the Position like:

type ExpectedPosition<'Position> =
    | Any
    | NoStream
    | Exact of 'Position

type EventsReadRange<'Position> =
    | AllEvents
    | FromPosition of 'Position
    | ToPosition of 'Position
    | PositionRange of fromPosition:'Position * toPosition:'Position

type Stream<'Position> = {
    Id : string
    LastPosition : 'Position
    LastUpdatedUtc: DateTime
}

This is the only store I know that acts like this for its event positions.

deyanp commented 5 years ago

What about implementation for MongoDB? I might need to move from Cosmos DB to MongoDB due to Cosmos' expensive storage (no compression), so I could write something ...

Dzoukr commented 5 years ago

@TheAngryByrd I like the idea of Position being generic type! Thanks for that!

kunjee17 commented 5 years ago

@Dzoukr as we are discussing v3. One more minor change. There are too many string as id. So, it would be great if we will type alias so it would be great in intellisense when calling functions.

Dzoukr commented 5 years ago

Are you referring to Stream type?

The type alias is fine, maybe we could also think about single case union type StreamId = StreamId of string?

kunjee17 commented 5 years ago

@Dzoukr yup. Similar stuff.

kunjee17 commented 5 years ago

@Dzoukr any status for others data store. It should technically work for everyone.

bartelink commented 5 years ago

FYI https://github.com/jet/FsCodec has been split out of Equinox - might make sense to support it as a way of generating messages? edit: not actually suggesting to depend on it, but it might make sense to have an example of mapping a DU to an Event record using it

Also regarding a Stream type, IME something like type Target = AggregateId of category: string * id: string | StreamName of streamName: string can work well like this - lets you keep a strongly typed sub-id with FSharp.UMX and then encapsulate adding the categoryId (which you then use as a constant when parsing on the projection side)

bartelink commented 5 years ago

You'll also never get a better time to rename all Cosmos Collection terms to Container - the portal and all docs now exclusively refer to Containers.

Dzoukr commented 5 years ago

Hi @kunjee17, the Cosmos and TableStorage rewrite is done. I'll do the InMemory soon and then I was thinking to hand it over to you to resolve Postgres 😄

Hi @bartelink , yes, Collection is no more, now Container is used (new version is based on SDK v3, which has btw totally changed). Regarding the Stream type... I may not fully understand what is the meaning of AggregateId of category: string * id: string - is it another way how to generate string identifier of Stream?

bartelink commented 5 years ago

V2 -> V3 porting guide ;) https://github.com/jet/equinox/pull/144 (having a fight with them in the repo about the fact that they want to neuter the CFP APIs a lot in V3 though :( ) I see some straggler usages of collection in https://github.com/Dzoukr/CosmoStore/blob/version3/src/CosmoStore.CosmosDb/EventStore.fs etc?

Regarding the AggregateId idea, I'm trying to convey that in systems like EventStore, SqlStreamStore and others, one typically uses the convention of naming streams {category}-{id} - an ideal opportunity to levelage the power of string concatenation, one might think ;)

In the stuff I've linked to, I demonstrate how one can cleanly: a) tag the id for a stream so you can't mix them up using https://github.com/fsprojects/FSharp.UMX b) compose the streamName: string from the category and the streamId If your StreamId type has the 2 cases, the impl would just be let streamName = function StreamName s -> s | AggregateId (cat,id)sprintf "%s-%s" cat id)` but that then lets people put the category name bit as a constant Literal in one place versus concatting it everywhere. Whether this makes sense for CosmoStore is debatable of course given it doesn't get involved in app territory in the way Equinox does, but I recommend considering it as you're looking at making those sorts of things generic in this stretch of work.

In general, using AggregateId then also works well with the Category active pattern to which I linked - they're two sides of the same coin

Dzoukr commented 5 years ago

Yeah, branch is not complete clean up - there are some renames left. :)

Regarding the AggregateId, frankly, I am not convinced. I see what you mean, but I don't see the benefit over simple string as Stream name. From my previous experience, we had two different approaches to generate Stream identifier and simply cannot always say it is %s-%s. To have it really robust, I would have to have separator part of union case params, which would be messy. I would prefer simplicity here - name it as you like, and if there is any strict logic, write your own function from CustomLogic -> Stream string name.

bartelink commented 5 years ago

That's fair - as I say, it only happens to be beneficial for Equinox as part of a larger programming model. Just suggesting this at the point where you actually go beyond using the simplest thing:- string

The other thing to point out is that you'll run into lots of mess using EventStore if your stream names do not conform to a %s-%s pattern - any pattern not following that starts at -100 points for me

Dzoukr commented 5 years ago

@kunjee17 Cosmos, TableStorage & InMemory are now fully converted. Should I do also the Marten one, or leave it up to you?

kunjee17 commented 5 years ago

@Dzoukr as you wish. It should not be issue but if you like I can do it by next week. It depends how much you are in Zen to convert and how much delay is OK for you.

Dzoukr commented 5 years ago

This issue is 2 months old 😄 I am not in hurry. If I will find time earlier, I can do it, but I am pretty sure it won't happen due my work duties.

kunjee17 commented 5 years ago

@Dzoukr I will have a look. BTW as position is now not int64. So, now we need to abstract things there too. I will have a look. Mostly next week.

Dzoukr commented 5 years ago

Perfect. I'll leave it up to you. I will abstract some tests because of position is not fixed type - it is up to each implementation what to use (even in tests). Thanks again @TheAngryByrd for good idea about that.

kunjee17 commented 5 years ago

@Dzoukr why we are having this? https://github.com/Dzoukr/CosmoStore/blob/version3/src/CosmoStore/Validation.fs . I actually forgot about it. This doesn't mean just add at end of stream. Because specific position is not end of stream we give error and no stream is also not end of stream (for new stream so it is start of stream) we give error.

Dzoukr commented 5 years ago

@kunjee17 This is for optimistic concurrency. You start indexing each stream from number one. This function validates that index (position) you are going to write is same as expected.

Check this line for usage: https://github.com/Dzoukr/CosmoStore/blob/e1b689b97572bfe8058d87290bafdb8282168487/src/CosmoStore.TableStorage/EventStore.fs#L38

CosmosDb implementation does the same but in context of stored procedure.

bartelink commented 5 years ago

I guess its too late to start it from 0 ? EventStore, arrays and all that's good starts from there doesn't it ? ;)

Dzoukr commented 5 years ago

I am not against it, but that would be real breaking change (now version 3 seems pretty smooth update from v2). I initally thought to use 0 as start, but realized having it from 1 I can use the same number as information about number of existing events on stream level without any additional (even simplest) calculation. 🤔 Don't know, really. We can do it, we can leave it as it is...

bartelink commented 5 years ago

I agree its too late to do a breaking change and would put this sidenote-comment on gitter if you had one ;)

My point is that this doesnt align with EventStore and SqlStreamStore - I personally had an abstraction one-based at one point but realised in time: one-based in practice is actually agonising to work with because it doesnt align with [typically zero based]array indexing.

On the plus side, you're in alignment with VB (the one before VB.NET) ;)

bartelink commented 5 years ago

Taking a different angle - the 1 based value makes a lot of sense if you term it a Version - i.e. Version 1 being when there is one event at Position 0. So all you need to do is rename. Such a naming scheme also makes expectedVersion finally make sense; the representation in EventStore etc is such that ExpectedVersion when writing the first event is 'version' 0. You can then say that a Position is an index into the virtual 0-based array at which one is about to write, and a Version is the number of events you have. Then you use the [1 based] Version as the [0 based] Position for the next write and you're adding or subtracting in less cases.

Dzoukr commented 4 years ago

@bartelink dammit, you really made me think :))) I kind of like it to rename Position to Version and keep indexing from 1. The only concern here is backward compatibility. I need to think about fact, that for older events the term Position is used, but luckily CosmosDB and TableStorage are both schemaless, so it should be possible to handle on query side when reading events back. What do you think @kunjee17? Any blocker on your side with that? I really feel it's true as Ruben said: "It will together finally make sense." Since we are touching more things (like making it more generic), this could be good change.

kunjee17 commented 4 years ago

@Dzoukr We are still in schema less land in every implementation. I can't foresee any blocker.

Dzoukr commented 4 years ago

Latest update: I went through all the code for CosmosDB and TableStorage and renaming Position to Version is true breaking change. Events container is created with unique keys combination (streamId & position), which would make it difficult to change "on the fly" (not sure if even possible). Reading existing events (written in old manner) is ok, but this one is hard.

So really thinking if it is worth it. Any existing store from v1 and v2 cannot be used for working with v3. 🤔 Not without some kind of migration tool...

??? any thoughts?

TheAngryByrd commented 4 years ago

This is always the downside to writing something that persists data, the data migration strategy 😿

I personally wouldn't want to make a breaking change without having a data migration strategy for each persistence method, unless we know 100% for sure no one is using this is a production environment.

Another method is to inspect the underlying data at read time, figure out if we can deserialize to the v1/v2 format, then try to "upgrade" to the v3 format.

bartelink commented 4 years ago

One loses a lot by introducing the complexity of up/downconversion paths with associated tests. Can you do all the renames but leave a "using old label for backcompat" comment? If that halfway house is too much of a mess, I'd say the choice is obvious - leave it as it is; and there existing name is not without its positives

kunjee17 commented 4 years ago

If there are too many breaking changes then we can keep the v2 alive with v3 migration set up. Normally every library has to pass through this. I guess this library reach that point very early.

Dzoukr commented 4 years ago

Thanks all for comments, I now think I know how to handle backward compat... Need to test it first, but it looks promising now.

Dzoukr commented 4 years ago

Thanks all for feedback, just released v3 with all the changes. Table Storage & CosmosDB are both backward compatible 🎉 Please, @kunjee17, if you would find time to adjust Marten library to the latest version, let me know. Thanks again everyone involved!

kunjee17 commented 4 years ago

@Dzoukr was out for a week. Just came back. I ll have a look at other implementation of the same.

kunjee17 commented 4 years ago

@Dzoukr code is in master or should I branch out from Version3 branch ?

Dzoukr commented 4 years ago

@kunjee17 Version3 is deleted - please use master