Squidex / squidex

Headless CMS and Content Managment Hub
https://squidex.io
MIT License
2.27k stars 455 forks source link

Does it work with Azure Doc/CosmosDb? #44

Closed ctolkien closed 6 years ago

ctolkien commented 7 years ago

It has a MongoDb compatible API, so it might work? Anyone given it a crack?

SebastianStehle commented 7 years ago

I have not tried it yet. I run Mongodb in a Kubernetes cluster at the moment. But it should work. I hope that DocumentDB/CosmosDB supports all driver features.

pushrbx commented 7 years ago

I tested it with Cosmos DB today, but it doesn't work, because the aggregate queries aren't supported yet. It's still in private preview. With Cosmos DB squidex fails in the AccountController, ExternalCallback method when you try to login with your Google account. That method has the var isFirst = userManager.Users.LongCount() == 0; line, which is the place where the driver tries to do an aggragation query on the db. https://stackoverflow.com/questions/44844678/azure-cosmosdb-using-mongo-drivers-get-count-with-out-getting-all-documents-bas I'll try to tweak it a bit hoping I could make it work with cosmos.

SebastianStehle commented 7 years ago

Awesome. It would be nice to get it working.

There are three "databases" at the moment:

  1. EventStore
  2. Identity (Users)
  3. Read Model

Some solutions:

  1. EventStore The event store is not easy to implement. You have to use the features of the Database to get the correct behavior. E.g. if you have an auto increment index, it is much easier. MongoDB does not have an autoincremented index, so I had to use the MongoDB tImestamps, which is a little bit special, because it is timestamp and index to create unique values for each document.. So we probably have to do CosmosDB implementation for that.

  2. Identity I guess there are already providers for the most important database systems, you just have to use them, if there is a problem.

  3. Read Models It would be very cool to use CosmosDB for them. I think CosmosDB might be even better than MongoDB because afaik it can manage indices automatically (or just indexes everything)

pushrbx commented 7 years ago

There is no need to change the query system to something different, it can stay as is, it is only required to replace those queries which try to do an aggregation query to ones where the system loads the data to memory and does the aggregation on it there, because Cosmos will support it anyway eventually. I'm still trying to wrap my head around the code, but I think it's going to be a different implementation for one of the mongodb related interfaces and a config item to say "we are using cosmos".

SebastianStehle commented 7 years ago

Would be awesome :)

pushrbx commented 7 years ago

I made some fixes, now I can create users, list the users, modify my profile. (pull request soon) But I can't create an app, because during publishing the event it throws a MongoDB.Driver.MongoCommandException: Command failed. exception. How does the app creation work? What are the steps until the app gets created? Screenshot about the exception in one of the documents in CosmosDB: https://snag.gy/AQwDCp.jpg

SebastianStehle commented 7 years ago

It is an CQRS architecture. Everything is written to the EventStore and then the events are handled to populate read models. I think the EventStore could be the critical part.

SebastianStehle commented 7 years ago

I think the problem was the BsonTimestamp, that is not compatible with MongoDB. I am working on an implementation that is not dependent on this data type.

pushrbx commented 7 years ago

I could catch the exception finally. For this query: {{ "find" : "Events", "filter" : { "Timestamp" : { "$gte" : Timestamp(0, 0) }, "EventStream" : /^asset-/ }, "sort" : { "Timestamp" : 1 } }} CosmosDB responds through the mongodb driver with this: {{ "_t" : "OKMongoResponse", "ok" : 0, "code" : 9, "errmsg" : "Syntax error, incorrect syntax near '27'.", "$err" : "Syntax error, incorrect syntax near '27'." }} And when I directly query from Azure Portal I get this error: : {"code":500,"body":"{\"message\":\"There was an error processing your request. Please try again in a few moments.\",\"httpStatusCode\":\"InternalServerError\",\"xMsServerRequestId\":null,\"stackTrace\":null}"}

SebastianStehle commented 7 years ago

Yeah, found the same. Are you familiar with CQRS? You write events to the store and then even handlers ask for "All events since X".

In SQL you can just create an auto increment index for the position, but in NoSQl you need another solution. I used BsonTimestamp https://docs.mongodb.com/manual/reference/bson-types/#timestamps but it is a special solution. My other idea uses https://docs.mongodb.com/manual/reference/method/db.collection.findAndModify/ to generate a sequence number. Or just a local counter for a single server environment.

Btw: How did you solve the problem with the indices? It throws exceptions very often.

pushrbx commented 7 years ago

I have only changed the queries to make it work. I haven't touched the indices yet. I just met with CQRS two weeks ago. So I'm a bit newbie with it.

SebastianStehle commented 7 years ago

It is very tricky to implement an Event Store for Cosmos DB. I have an idea but it might cost a lot. I will have to check it out. But it is not an easy task.

pushrbx commented 7 years ago

Let me get this straight in my head: for the events which are being saved in the db the system requires a key, which in this case has to be a number, but if the system is running in a distributed environment there has to be a logic which would make sure that the generated number is unique. Also something has to guarantee the order of the events. You have tried to do this by using precise timestamps, so there was no need to check the uniqueness of the value. Now with cosmosdb there is a way to do it, but it requires extra amount of read operation. Is this how it is?

SebastianStehle commented 7 years ago

Almost:

Event handlers read events from the event store. They use an operation "Give me all Events since X" and X is something like a position. Of course they also store the last position they have received. You could use a auto-incremented ID for for the Position or create a token with multiple values. Does not matter.

For MongoDB I use BsonTimestamp, which is generated server side and almost like an auto-incremented index. For CosmosDB there is no equivalent solution.

Generating a unique number in general is not so difficult. On a single server it is really easy. In a distributed environment you can just use Redis or so to create this number. Even if you write thousands of events per second it should be fine.

But the problem is that the following operation is not atomic:

    var sequenceNumber = GetSequenceNumber();
WriteEvent(event, sequenceNumber);

This means that you could write Event2 before Event1. If the event handler asks the EventStore between Event2 and Event1 to get all events he only gets Event2. Then next time he asks, he will get all events after Event2 and never read Event1.

One way to solve it for CosmosDB: You could write the events without this number and then create a background task, that reads all the events that have no sequence number yet (ordered by timestamp) and assigns a unique sequence number. Of course you have a single point of failure now, but you can just restart the sequencer automatically. What I am not sure about: The sequencer has to run as often as possible, e.g. every 50ms or so to keep the delay until an event is handled low. And it costs money. I am not sure how much. This is question.

dsbegnoche commented 7 years ago

@SebastianStehle @pushrbx Any update on working with CosmosDB? Looking at switching to Cosmos, but was unsure of the effort involved in updating to use.

SebastianStehle commented 7 years ago

I could not find a solution how to generate an auto incremented value in CosmosDB.

The only option would be to implement a custom EventStore on top of CosmosDB / and other SQL solutions. But there are specialized solutions for that, e.g. GetEventStore.

pushrbx commented 7 years ago

Also I'd like to add that the cosmosdb is a very pricey choice, and it's an overkill for an application which uses squidex, unless you have a much more bigger application in mind (e.g. inventory management at DHL) or your traffic is on the scale of facebook.

I can see an another reason why people would prefer a route of cosmosdb: the convenience of azure management portal. We could create a squidex template for azure marketplace, ofc this should go in a different issue. :D

SebastianStehle commented 7 years ago

Oh yes, I have forgotten to delete my Cosmos DB instance after testing and it was 150€ for a month or so :(

I can publish my kubernetes templates. It is much easier solution and not coupled to any cloud provider. With some changes you can publish it Azure, GCloud and Amazon.

DavidRouyer commented 7 years ago

I'm trying to use CosmosDB for the Event Store and I have encoutered two problems:

This looks cool.

SebastianStehle commented 7 years ago

I am not sure if change feeds solve the problem. I think the best idea would be to make a wrapper service that linearizes writes in some way. But I guess you could even store the events in Table Storage, which should be much cheaper.

pushrbx commented 6 years ago

I guess cosmos should work as event store now: https://azure.microsoft.com/en-us/roadmap/aggregation-pipeline-preview-and-unique-indexes-for-azure-cosmos-db-mongodb-api/ It is already in public preview. https://azure.microsoft.com/en-gb/blog/azure-cosmosdb-extends-support-for-mongodb-aggregation-pipeline-unique-indexes-and-more/ And here is the list of supported syntaxes: https://docs.microsoft.com/en-us/azure/cosmos-db/mongodb-feature-support I'm broke now so I can't test it. 😄

SebastianStehle commented 6 years ago

Probably, but the ordering would be still a problem. We have to write our own event store :D

DaveA-W commented 6 years ago

Would a trigger to set the timestamp suffice? e.g. https://stackoverflow.com/q/44656905 - although note comments on the suggested answer saying you'll need to explicitly decorate your calls with the name of the trigger to force it to run.

SebastianStehle commented 6 years ago

You would also need an offset if multiple values have the same timestamp.

vhendriks81 commented 6 years ago

Not sure what the actual problem is, but i have written an eventstore implentation on cosmos db. I treat each document as a group of events. Each document has a guid id, the aggregate id, a version number and a list of events. By adding a unique constraint on aggregate id and version concurrency issues are eliminated.. ofcourse partition key is the aggregate id.

Not sure if it helps..

SebastianStehle commented 6 years ago

How do you make something like "GetAllEventsSinceVersion()"?