[Feature] Support a CosmosDb Outbox

iancooper commented 2 years ago

Problem We don't support the Outbox pattern for CosmosDb. increasingly, Azure customers may be using CosmosDb for entities and we should support working with the Outbox and CosmosDb

Describe the solution you'd like

Support for using it with a CosmosDb TransactionalBatch

Describe alternatives you've considered

Consider support for using the CosmosDb Change Feed instead of a TransactionalBatch as the source of writes to the Outbox. This would be a separate process and whilst valuable represents the story after this one
It's possible that a 'single table' approach works with CosmosDb as well.

preardon commented 2 years ago

@iancooper were you thinking about just doing this with the SQL API or did you want this compatible with a few API's?

iancooper commented 2 years ago

@preardon TBH I know lots about DynamoDb, but little about CosmosDb, and I mainly added this because as I was updating the backlog for Dynamo it occurred that we should do CosmosDb. DynamoDb has an issue with multiple APIs too, and I was probably going to support any API that offers transactions, but feel free to update this ticket with more knowledge if you have it.

preardon commented 2 years ago

I'de be happy to have a discussion with you about this and give it a whack after we have decided which way to go

iancooper commented 2 years ago

@preardon Please do. It's under a v9.next label.

iancooper commented 2 years ago

I've put it under your name for now

preardon commented 2 years ago

Just to give a bit more information

Cosmos Db has a few APIs that you can use for Compatibility ( SQL (Core), mongoDb, Gremlin, Casandra, and table) these can not be changed after the instance of Cosmos has been created. Realistically if we are going to write an Outbox for Cosmos I would recommend that we target the SQL API as the others are more for the ability to use Cosmos for apps that were written for something else.

I would also like to do this in a way that we can use Managed Identity to connect (However I feel the currently accepted method to do this (Use MSI to Get a Key that you could just otherwise Supply))

In Cosmos Db there is the Idea of a Transaction and a TransacitonalBatch. A Transaction or TransactionalBatch can only happen inside a Logical Partition (sharing a Partition Key).

Some time after v9 I will look to make a PoC to see what we can achieve here

Jonny-Freemarket commented 1 year ago

Any more thoughts on this @preardon? I know you've been busy with other things. Some thoughts after reading the MS example of a Cosmos DB backed Transactional Outbox.

How much do we get involved in client code?

Unlike EF Core we can't add something to the DI container that can be shared between client code and Brighter code, because the Container.CreateTransactionalBatch(string partitionKey) method (as the signature suggests) requires the partition key, which would probably be the ID of the entity you're raising events about.

A way around (like in the MS example) is to have object that contains all entities and events to committed to Cosmos DB. This would probably have to be provided by us, along with an interface that specified the PartitionKey type that would be need to be implemented on both the client Entities and Events types.

So the client code would give us the objects that they want to persist, and we get the Events via the _commandProcessor.DepositPost(..) method, create the Transaction ourselves (based off the client Entity partition key), and save everything to Cosmos. This seems reasonable, but is it too opinionated be useful?

I'm keen to write something for Cosmos DB as it would fit a requirement we have with some of our microservices, however for those services they will only be using the Inbox/Outbox, they have no other need for storing entities. So we don't have the issue of sharing a transaction between the two sides. Depending on how this go we may end up rolling our own internal Cosmos DB Inbox/Outbox using the Brighter interfaces so we can still make use of .DepositPost(..).

preardon commented 1 year ago

I will have to have a look a little deeper

Realistically to have brighter work the current way we need a way of getting existing transactions, which I suppose we need a way of identifying what partition/transaction the message belongs to. I'll see if I can get some time to look closer after I'm back from Zurich and then let's have a call

iancooper commented 1 year ago

Let me ponder again.

For Dapper I created a UnitOfWork that you need to use if you want a transaction. I wonder if we have to ask something similar.

preardon commented 1 year ago

I think a UnitOfWork will be involved the interesting bit for me is how the Transaction providing aware connection provider will know how to return the correct transaction for the message given transactions are now partition scoped not Operation scoped

DevJonny commented 1 year ago

I was thinking along the lines of that we create a the transactions rather than the client code. So the calling code actually has no interaction with the transaction, we just take the objects that need to be presided.

iancooper commented 1 year ago

A traditional unit of work, not the one in the MS article, tracks changes so you can commit them in a single transaction.

So really it has both the container context and the uow from that article. A repository is not required.

What it does is let you add in the single transaction.

The problem though is that won’t be easy to find i.e. how do we scan for non-dispatched.

I still suspect you take the activities that you write to that partition and then use the event stream from the Db to put them into an Outbox, then sweep that.

I thought about that for DynamoDb but the .NET libs for reading the stream is a nasty Java wrapper, so I avoided it

On 29 Jul 2022, at 21:48, Jonny Olliff-Lee @.***> wrote:

I was thinking along the lines of that we create a the transactions rather than the client code. So the calling code actually has no interaction with the transaction, we just take the objects that need to be presided.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

DevJonny commented 1 year ago

That makes sense Ian. The dispatching from the Outbox in the article looks to be along those lines, use the Cosmos change log to dispatch events. Either using a background service or Azure function. We could write the background service like the timed sweeper.

Jonny-Freemarket commented 1 year ago

The problem though is that won’t be easy to find i.e. how do we scan for non-dispatched.

I still suspect you take the activities that you write to that partition and then use the event stream from the Db to put them into an Outbox, then sweep that.

So doing some further reading of the Docs (can you tell I'm quite excited about this 😄) you can do a cross-partition query (it's actually just doing individual queries across each partition on your behalf) so we can use that to sweet for no-dispatched items. Due to it's nature, I would say that this probably best reserved for on-demand/explicit sweeping.

As you suspect, we can also use the Cosmos DB change feed to dispatch to your chosen transport. What I was thinking (which is along the lines of what is in the article) is that we write the Outbox in the transaction, this triggers changes in the change feed, we then run a service that uses the Change Feed Processor library (from MS) to do the dispatching, and updating of the Outbox to say dispatched. The only snag here is what happens when we can't process a message, for example if there is a buggy MessageMapper, we would have to carry on so the feed doesn't get stuck, but what do we do with the faulty message, and any messages that should proceed it in the order. As we don't normally use DLQs (although may be we should in this instance) we need to make it observable that a message failed to be sent and requires manual intervention. Also we would need to make it clear in the logs that although the change feed is ordered correctly, dispatching to the transport may not preserve ordering.

Just spitballing for the most part based on what I've read so far, there may be nuances that I've missed or things that don't quite fit how we do things with other outboxes and transports. Is it okay that we differ? I feel like we may not be able to make all Inboxes/Outboxes work in the same way.

iancooper commented 1 year ago

@Jonny-Freemarket In my head you built one of these??

Jonny-Freemarket commented 1 year ago

Yeah I've built one for Freemarket, but it doesn't fully address some of the issues above. We're not persisting entities to our Cosmos DB along with our events, we're purely persisting the events (because the services that use it are largely stateless), so the implementation doesn't carry across to what we'd expect from a Brighter Outbox with a shared transaction.

iancooper commented 1 year ago

@Jonny-Freemarket Dang. It would be nice to get one of these, but if I wrote one it would be purely speculative because I don't work with Cosmos.

Jonny-Freemarket commented 1 year ago

The main issue I got stuck on was:

The only snag here is what happens when we can't process a message, for example if there is a buggy MessageMapper, we would have to carry on so the feed doesn't get stuck, but what do we do with the faulty message, and any messages that should proceed it in the order. As we don't normally use DLQs (although may be we should in this instance) we need to make it observable that a message failed to be sent and requires manual intervention. Also we would need to make it clear in the logs that although the change feed is ordered correctly, dispatching to the transport may not preserve ordering.

If yourself or @preardon have any suggestions I might be able to get something together for review.

BrighterCommand / Brighter

[Feature] Support a CosmosDb Outbox #1777