crate / crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
https://cratedb.com/product
Apache License 2.0
4.04k stars 554 forks source link

Will CrateDB ever support transactions? #6863

Closed CheyenneForbes closed 6 years ago

CheyenneForbes commented 6 years ago

I love everything about CrateDB except the fact that transactions arent supported, could this feature be added and can be enabled by users who need it? Maybe looking at Apache Tephra could help, it is used by Apache Phoenix to provide transactions in SQL on data stored in HBase

mxm commented 6 years ago

Thank you for your feedback. As you know transactional support is not the scope of Crate. Instead we focus on providing scalable read/write performance, ease of use, and a great SQL experience. Transactions would impact the SQL performance quite significantly. Adding transaction support is not trivial because you can't add it just for just some queries, all queries are effected by this.

We realize there are use cases which could benefit from transactions. We have added some functionality in the past to enforce more constraints when reading data. For example, we read the value from the primary only when a primary key has been specified in the where clause. This ensures that we don't read data from a replica which hasn't received the data from the primary yet.

I think moving towards providing more guarantees is the way to go. We could, for example, provide an operation which allows to refresh the Lucene reader immediately before reading data, ensuring that the latest version of data is read before running an UPDATE query. While this would not be a complete transaction support, it would help to realize certain use cases, e.g. where users want to atomically increase a value.

May I ask, what is your particular use case for transactions?

CheyenneForbes commented 6 years ago

Heres my use case:

I want to ensure when user A purchases something from user B all this happen or none does

update wallet set balance = balance - 100 where user_id = userA and balance > 100;
update wallet set balance = balance + 100 where user_id = userB;

how can I achieve this using crate?

mxm commented 6 years ago

Good news! You can do that with CrateDB.

cr> create table wallet (user_id int PRIMARY KEY, balance long);
cr> insert into wallet (user_id, balance) values (1234, 100);
INSERT OK, 1 row affected  (0.004 sec)

Let's figure out the current version before we update:

cr> select _version, user_id, balance from wallet where user_id = 1234;
+----------+---------+---------+
| _version | user_id | balance |
+----------+---------+---------+
|        1 |    1234 |     100 |
+----------+---------+---------+
SELECT 1 row in set (0.003 sec)

Now let's update but ensure that we don't cause any inconsistencies:

update wallet set balance = balance + 100 where user_id = 1234 and _version = 1;
cr> UPDATE OK, 1 row affected  (0.016 sec)

Sweet, update went through (affected rows = 1).

Now let's do a concurrent update from another process:

update wallet set balance = balance + 100 where user_id = 1234 and _version = 1;
UPDATE OK, 0 rows affected  (0.003 sec)

Update did not get through (affected rows = 0).

What is the outcome?

cr> select _version, user_id, balance from wallet;
+----------+---------+---------+
| _version | user_id | balance |
+----------+---------+---------+
|        2 |    1234 |     200 |
+----------+---------+---------+
SELECT 1 row in set (0.003 sec)

That's exactly what we want!

So you just need to check the _version column and see how many rows were affected. You need to retry for as long as you don't get back a row count of 1 or you can of course also check if the version increased.

For more information see also https://crate.io/docs/crate/reference/en/latest/general/occ.html

edit: I overlooked, in your example, you want to transfer from userA to userB. This is not possible unless you a) ensure that the transfer is only performed by a single process or b) you use a column as a lock object, exactly as above with the _version to ensure that only one process at a time acquires the lock and is allowed to modify the table. It might be easier to use a transactional database for your wallet and use CrateDB for everything else.

CheyenneForbes commented 6 years ago

Thanks a lot, I was considering the column lock method. I'll see how it goes

gm2211 commented 5 years ago

Bumped into this while looking into cratedb - there is no way around transactions for the provided example, even if you ensure only a single process is performing changes to the table and you use columns locks. This is because there seems not to be any atomicity guarantee for a group of operations, so if you subtract from the balance of the first user within one process and that process crashes before it has added it to the other user's balance, you're out of luck.

I think the suggestion of using a relational db for this use case and cratedb for the rest is the way to go.

Also, fyi, mogodb 4 supports transactions and there are libraries that can add transactional layers on top of any key/value store.