Dzoukr / CosmoStore

F# Event store for Azure Cosmos DB, Table Storage, Postgres, LiteDB & ServiceStack
MIT License
168 stars 21 forks source link

Azure Table Storage Change Feed? #22

Open deyanp opened 5 years ago

deyanp commented 5 years ago

Hi,

I saw the comment that the author moved from Cosmos DB to Azure Table Storage due to high costs with the former. How do you push the data to the Read Model though, I couldnt find Change Feed or similar for Table Storage ... and polling doesnt sound workable ...

Best regards, Deyan

Dzoukr commented 5 years ago

Hi @deyanp,

there is IObservable of appended event as part of CosmoStore instance - see https://github.com/Dzoukr/CosmoStore/blob/master/src/CosmoStore/CosmoStore.fs#L59

Another (and also widely used by us) approach is to compose functions Cmd -> Event list with Event list -> Event list (doing side-effect writing to projection database). It is matter of taste, someone doesn't like reactive approach, someone does.

deyanp commented 5 years ago

Hi,

Does this mean that the projections get built "in-process", without any guarantee in case the process crashes after writing to the event stream?

Best regards, Deyan

Dzoukr commented 5 years ago

Yes, it would have to happen just between writing to event store & projection database, but it can theoretically happen and you would have to do replay of missing events in such case. Or you can plug queue in-between and write projections in separate process / application. AFAIK there is no Change feed for Table Storage so it is up to you how to lower the risks of eventual consistency.

deyanp commented 5 years ago

Yep, this is the problem I am facing ... writing to a queue is not a solution, as I cannot (and dont want to) open a distributed transaction between Azure Table Storage and Azure Event Hub for example ..

What issues with the costs of Cosmos DB did you face exactly (if I may ask), and do you think there is a solution to them?

Dzoukr commented 5 years ago

Well, the pricing of Cosmos DB scales differently. If you need to start "low" (imagine weekend project) with few events stored, few aggregates, you still need to have 400 RU/s as current minimum. And such minimum is still expensive as hell comparing to Azure Table Storage where you pay mostly for space, which is negligible.

To make it clear, I still love Cosmos DB - amazing product, but until MS will change pricing to be more friendly for low-cost/weekend projects, it will be product chosen mainly by bigger companies.

deyanp commented 5 years ago

Thank you for sharing your concerns, now I understand better. I am thinking of using 1) Cosmos DB for the write side (taking advantage of the Change Feed) 2) Azure Table Storage for a) the read side (duplicate denormalized projections) b) duplicating all events from Cosmos DB to Azure Table Storage for replay purposes, assuming reading all events directly from Cosmos DB would incur a lot of RUs/costs c) aggregate snapshots (last state of aggregate, not to have to read and replay all old events)

Alternatively I was thinking about Azure PostgreSQL for 2a), as Azure SQL Database seems to be much more expensive ...

What do you think about the above approach?

Dzoukr commented 5 years ago

reading all events directly from Cosmos DB would incur a lot of RUs/costs

That is the funny part. If your Cosmos DB collection has 400 RUs, you just pay for it. Constantly. No matter if you use it or not.

Otherwise it looks ok - let me know how it works.

dharmaturtle commented 3 years ago

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

bartelink commented 3 years ago

Slight tangent but... I'd be interested to see how you represent the events and/or manage efficient idempotent writing to azure tables (the thing termed 'changefeed duplication' above)

I suspect that forking Propulsion.Cosmos.Sink might be a good way to scale the archival process. In the proArchiver template (complete, but unmerged in https://github.com/jet/dotnet-templates/pull/79), I duplicate events from the primary out to CosmosDB (see in-depth discussion of my rationale).

deyanp commented 3 years ago

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

@dharmaturtle , as many things in life, this one also turned into a different direction: MongoDB for writes and some reads, and Azure Data Explorer (ADX) for DWH/Reporring/more complicated reads.

Cosmos DB surprised me a bit negatively - everything must be partitioned, bloated storage (200 bytes turn into 900 bytes somehow, and you pay for uncompressed storage) and what I need very much - missing atomic updates ...

ADX is sth I recommend a lot, MongoDB has its quirks ..

bartelink commented 3 years ago

and what I need very much - missing atomic updates ...

what about the batch APIs ? can stored procs to the job (in general you should be able to get it done with the bulk apis though [unless you have specific things that really benefit from being able to gain efficiency with reduced roundtrips])

Re that per doc overhead, I can definitely concur (which is why equinox packs events into docs, it seems that ~30k is the sweet spot though there are lots of factors to consider)

deyanp commented 3 years ago

@bartelink , neither sprocs nor anything else helps I am afraid. I need to update a shared account balance multiple times per second in parallel (e.g. 20x), and I cannot at all afford any optimistic concurrency exceptions. I have looked at stored procedures and under the hood they also do optimistic locking stuff .. so no way that I found, unfortunately :(

They say they support MongoDB's API (even though 3.2/3.6, which is outdated) and findOneAndModify/Update in particular (which is atomic, with $set, $inc etc commands) but even though I asked (see https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/38110195-support-for-atomic-updates-in-sql-api) they did not confirm and I am afraid also there under the hood there is some optimistic concurrency going on ...

bartelink commented 3 years ago

@deyanp I'd be surprised if the Cosmos MongoDB interface offers any increment on native functionality. I agree the bulk facility is covering a very different use case

Not sure if it's remotely useful but in Equinox.Cosmos we solved a similar problem via:

In some cases, you can stack the requests up in some form of queue or bus (which also)

If you're literally only looking to do an inc operation, the bottom line is that at CosmosDB level there simply has to be a read, then an update followed by an etag-contingent update - you can rig it such that in the failure case you recurse within the stored proc