cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.95k stars 3.79k forks source link

sql: support rolling back nested transactions containing schema changes #10735

Open knz opened 7 years ago

knz commented 7 years ago

CockroachDB 20.1 is introducing support for nested transactions via postgres' (and SQL standard) savepoint protocol. SAVEPOINT, RELEASE SAVEPOINT, COMMIT SAVEPOINT.

However as of 20.1, a (nested) transaction containing schema changes—or any type of DDL statement, really—cannot be rewound. This is because the logic needed to invalidate cached schema elements in the transaction has not been implemented yet.

To advertise compatibility with PostgreSQL to a level that satisfies most uses in client apps, this gap remains to be closed.


Note: the discussion starting in Nov 2016 up to and including February 2020 was about the initial design of nested transactions in CockroachDB. Discussion from March 2020 onward is about rolling back over DDL statements.

Jira issue: CRDB-6134

bdarnell commented 7 years ago

It's not just a followup txn at the same timestamp: when you ROLLBACK TO SAVEPOINT, everything before the SAVEPOINT is still visible, but everything after the SAVEPOINT has been rolled back. That's not something we can feasibly do.

We could easily fake SAVEPOINT in a way that technically has the correct semantics, although it wouldn't actually make savepoints useful: whenever ROLLBACK TO SAVEPOINT is executed, return a TransactionRetryError to restart the whole transaction from zero (and SAVEPOINT and RELEASE SAVEPOINT commands become no-ops). This would make us compatible with drivers/frameworks that unconditionally use savepoints.

knz commented 7 years ago

Thanks Ben indeed what you say is the basic behavior for cases where a client just wants the syntax to work.

We discussed this with Andrei yesterday and I started to understand what's needed. Conceptually what we need here is "sub-transactions" in kv, which are able to see both their own intents and that of their parent(s). A lot of plumbing will be needed but I believe there's not much more to it than that (conceptually, that is).

tlvenn commented 7 years ago

Any chance to see some progress on this front ? I am surprised that only ecto/elixir is leveraging savepoints in the ORM world to assist with testing.

Thanks a lot in advance !

knz commented 7 years ago

Hi Christian, thanks for the heads up.

I do not know how far savepoints are exploited by different frameworks, but as a data point we've been working on compatibility with various other ORMs last quarter and savepoints didn't come up as a requirement yet.

Nevertheless, there is a large chance to see some progress on this front eventually, since we can already foresee a technical approach that amounts to an incremental change. However it's not on our short-term roadmap for now -- I predict our focus will be on stability and performance for the first half of 2017. As usual, our roadmap can be influenced, feel free to contact @dianasaur323 for details.

tlvenn commented 7 years ago

Thanks for the feedback @knz, much appreciated. It's true that Ecto is doing something pretty fancy/advanced in its use of savepoints for tests so no wonder other ORMs dont leverage it much... I will investigate on Ecto side for a workaround for the time being.

tbg commented 7 years ago

diesel-rs also uses savepoints for everything with the vanilla PG-backend, and while it's not an extremely widespread ORM (or so I suspect), I also had the impression that going with @bdarnell's suggestion above would make the most sense as I can't picture how we would "properly" implement savepoints on top of the current architecture, even in the longer term. Making it so that a failed subtransaction busts savepoints all the way to the top level transaction (which would then be retried) seems straightforward enough and useful. This might mean that some uses of savepoints which make sense for PG don't make a whole lot of sense in Cockroach, but that will be hard to avoid either way.

tlvenn commented 7 years ago

Could you maintain some kind of log per transaction where savepoints are acting as markers and when a rollback to a given savepoint is requested, you would rollback the txn and then start another followup txn at the same timestamp replaying the log until the marker is reached ?

tbg commented 7 years ago

You could, but that seems dangerous (begin; /* 1000mb of inserts */; savepoint; something_that_errors) and the only difference is that the server replays, not the client. The server would unconditionally have to buffer everything as it can never be certain that there won't be a future savepoint.

tlvenn commented 7 years ago

Isn't it how Postgres achieve some kind of HA / Replication, by streaming and replaying their XLOG ? At the same time I am not so clear on @bdarnell suggestion. Would emitting a TransactionRetryError to the client allow for the client driver to transparently restart the txn ? Or is it impacting the end user code somehow ?

petermattis commented 7 years ago

@tschottdorf To be fair, 1000mb of inserts in a single transaction will likely cause problems even without savepoints.

I'm curious, would the proposal on the table (to interpret rollback to savepoint as restarting the entire transaction) be compatible with the usage by ORMs? Seems likely to me that it wouldn't if savepoints are being used for simplifying testing.

I'm not sure I understand all of the semantics of savepoints. Seems like we get part or all of the way there if creating a savepoint incremented the logical part of the transaction timestamp and we had a mechanism to revert all intents that were >=savepoint_timestamp. We'd also need to maintain multiple intent versions at each key, but that doesn't seem impossible. I'm also not sure of the ramifications of having writes within a transaction that are at different timestamps (though only the logical part would differ). Or perhaps instead of incrementing the logical part of the transaction timestamp, there is an additional optional savepoint sequence number associated with a transaction. There would be a lot of work to plumb this through the KV layer. Not something I'd want to tackle right now, but it seems feasible on the surface (which likely means there is a large under water mass hidden from view).

bdarnell commented 7 years ago

I'm curious, would the proposal on the table (to interpret rollback to savepoint as restarting the entire transaction) be compatible with the usage by ORMs?

It better be, since in Cockroach it's always possible for the entire transaction to get restarted at any time. This negates the benefits of savepoints, but it shouldn't break things for client applications/ORMs any more than our optimistic transaction model already does.

We'd also need to maintain multiple intent versions at each key, but that doesn't seem impossible

Not impossible, but hard. (it would have more benefits than savepoints, though).

Or perhaps instead of incrementing the logical part of the transaction timestamp, there is an additional optional savepoint sequence number associated with a transaction.

Yeah. Intents already have epochs; the best way to do this is probably to add new a new field to TxnMeta. Savepoints can (theoretically) be nested, which means that determining which savepoints are currently valid can get a bit complicated (unless we make some simplifying restrictions).

But I'm not convinced that "real" savepoints are a good fit for our transaction model - since full-transaction restarts are already relatively common, are you really going to be able to get enough value out of partial savepoint restarts to make them worth the complexity?

knz commented 7 years ago

I was thinking perhaps there is another angle here. Suppose a savepoint would also cause a sub-txn record to appear (perhaps this should rather be a new generation of the main txn, but what matters for my argument here is that it is a separate record), and all intents created after that to link to the sub-txn.

I think this buys us partial retries. Then conflicts with other txns can push only that new record if we can detect the conflict is only overlapping with its intents. This way the client does not need to retry everything. Of course a conflict with an earlier intent needs to push both its txn records and all the linked txn records created since then. That's complicated perhaps.

But I think in long running txn we could reduce the amount of work on both client and server this way, if the client issues save points at regular intervals.

tbg commented 7 years ago

But I'm not convinced that "real" savepoints are a good fit for our transaction model - since full-transaction restarts are already relatively common, are you really going to be able to get enough value out of partial savepoint restarts to make them worth the complexity?

That's basically my opinion, savepoints really just don't fit in well. I haven't investigated this thoroughly, but my impression was that savepoints are mostly used because "they're there". For example, they're convenient during tests (see ecto).

With any amount of complexity added, the behavior of what @bdarnell suggested earlier (the "easy solution") would still have to be handled by clients, just in less cases.

Short-running operations (which is hopefully what an ORM does most of the time) don't profit from being broken up into multiple stages of savepoints, so there we don't need it, unless they're doing something really fancy like using a nested savepoint scope to "test" whether they can get away with certain operations that would otherwise bust the parent scope if they failed (if those are around, would be instructive to see them).

Long-running operations which write large amounts of data are intrinsically problematic already (I wonder how Postgres handles them), and the subset of those that can profit from multiple stages might be rather small. Likely more could be gained by reducing the frequency of those transactions in the first place (for example, an import rarely needs to be in a transaction) or making them less likely to restart.

ISTM that there would have to be specific use cases to inform anything more complicated than the "easy way out", and something more complicated could be added later without any changes in public interfaces.

petermattis commented 7 years ago

ISTM that there would have to be specific use cases to inform anything more complicated than the "easy way out", and something more complicated could be added later without any changes in public interfaces.

Definitely. I too doubt that ORMs would use savepoints in normal usage. If they do, I'd love to see some concrete examples so we could understand why. And I'd like to understand how they are using savepoints to simplify testing. I can imagine the testing usage doesn't have proper retry loops which would make the simplified approach a non-starter.

petermattis commented 7 years ago

Savepoints can (theoretically) be nested, which means that determining which savepoints are currently valid can get a bit complicated (unless we make some simplifying restrictions).

Wouldn't keeping a sequence number for the savepoints by sufficient. That is, if you create savepoints A, B and C with sequence numbers 1, 2, 3 and then rollback to A, it would be straightforward to determine than only savepoint A is still valid. A ROLLBACK would reset the active savepoint sequence number (and rollback the associated intents). A RELEASE would remove the savepoint from the session state prohibiting a future rollback to that sequence number. I guess I don't see a particular complication with multiple savepoints beyond the complication for a single savepoint (which is significant enough that we almost certainly don't want to do this right now).

tlvenn commented 7 years ago

Definitely. I too doubt that ORMs would use savepoints in normal usage. If they do, I'd love to see some concrete examples so we could understand why. And I'd like to understand how they are using savepoints to simplify testing.

I gonna have to ask @fishcakez if he can help us understand better why Ecto is leveraging savepoints for testing and in which situation it helps further than beginning and rollbacking a transaction for each test. Something that I am not so clear on myself...

bdarnell commented 7 years ago

The complexity with multiple savepoints is that when you roll back to A and invalidate savepoint sequence numbers 2 and 3, what do you do when your partially-retried transaction advances to savepoint B again? If you give it ID 4, you have to remember now that 1 and 4 are valid but 2 and 3 are not. If you reuse ID 2, you have to erase all trace of the old savepoint (by synchronously resolving all its intents at the time of the rollback).

And I'd like to understand how they are using savepoints to simplify testing.

I think the basic idea is to speed the tests up by running each test in a transaction and rolling it back when the test finishes. This means you don't have to do any work to restore the database to its original state between tests. But you want this to be transparent to the tests (so they can use their own transactions without being aware that they're in a larger transaction), which requires nested transactions (i.e. savepoints).

fishcakez commented 7 years ago

I think the basic idea is to speed the tests up by running each test in a transaction and rolling it back when the test finishes. This means you don't have to do any work to restore the database to its original state between tests. But you want this to be transparent to the tests (so they can use their own transactions without being aware that they're in a larger transaction), which requires nested transactions (i.e. savepoints).

It is exactly this, and it can allow tests to run concurrently if there is some care is taken. We start a sandbox that a test case interacts with, it holds a single connection that calls BEGIN at the start and ROLLBACK at the end. In this testing mode BEGIN, COMMIT and ROLLBACK are replaced with the savepoint equivalents when using the transaction functionality of Ecto. As single queries, without an explicit transaction, are normally executed inside an implicit transaction, these are also wrapped in savepoints. This is required because an error in an individual query would cause the sandbox's real transaction to enter an error state and future queries would error until the real transaction was rolled back.

The later functionality is executed efficiently over the wire using a batch of queries with a single sync protocol message on the happy path. This feature is exposed generally for use in transactions when either inside and outside tests if a query may fail but the transaction should continue as if it had not occurred. In a similar way to ON_ERROR_ROLLBACK in psql. However it is off by default and requires an explicit opt-in per query using it.

This means that Ecto always uses savepoints to create nested transactions, and does not have interleaving savepoint logic. Ecto also uses the sandbox for integration testing each SQL database. This means it is not possible to run the tests against CockroachDB.

tbg commented 7 years ago

Definitely. I too doubt that ORMs would use savepoints in normal usage. If they do, I'd love to see some concrete examples so we could understand why. And I'd like to understand how they are using savepoints to simplify testing. I can imagine the testing usage doesn't have proper retry loops which would make the simplified approach a non-starter.

This is just the general problem that ORMs for PG likely don't do proper retries anywhere. No matter what implementation of savepoints we choose, everything will have to expect retries at any stage of the transaction. So doesn't really seem that it informs this issue one way or another.

In the example of Ecto, if the test cases were written in a restart-aware manner (i.e. as a closure with retry handler) they should run just fine even with all savepoints collapsed internally, though with a lot of odd restarts (as ecto would release savepoints frequently, which would force a restart of the transaction). That is, of course, ignoring issues such as #12123.

tlvenn commented 7 years ago

Just to follow up on the Ecto use case for testing, with some help from @fishcakez, I was able to implement an Ecto Sandbox for CockroachDB that emulates the implicit savepoints created during tests using a log replay strategy.

https://github.com/jumpn/cockroachdb_sandbox

lgo commented 6 years ago

Another place where SQL savepoints are used is in ActiveRecord. It uses them for making explicitly nested transactions. I'm not sure how common they are in usage but they are probably more common in tests. (This can be worked around by monkey-patching ActiveRecord to raise an error if the user creates a new nested transaction while using the CockroachDB adapter.)

For example, the Rails test suite for ActiveRecord uses them specifically while populating the database with fixtures. It doesn't seem like they are used elsewhere in the tests though.

This snippet gives a short explanation for how ActiveRecord transactions work, and when savepoints will be used.

# Assuming there was no transaction previously, a new one will be
# created at the beginning of the block.
ActiveRecord::Base.transaction do
  # This implicitly calls transaction(require_new: false). This does not
  # make a new transaction.
  ActiveRecord::Base.transaction do
  end

  # This will attempt to create another transaction using CREATE SAVEPOINT.
  ActiveRecord::Base.transaction(require_new: true) do
    # If an error is encountered here, an automatic rollback will occur.
    # This will cause a ROLLBACK SAVEPOINT, and we just exit one level
    # of the code block.
    raise(Exception)
  end
  # Once we exit the next code block, this will send a COMMIT
end

# A similar transaction interface is provided on any user-defined models. For
# example:
class Customer < ActiveRecord::Base
end

# The primary use case of this is when using two models that are stored in
# different databases. In this case, if we nest multiple transactions on these two
#  models, we are actually creating two transactions on different databases.
# Again, these don't involve any savepoints until we pass transaction(require_new: true)
Customer.transaction do
  ADifferentDatabaseCustomer.transaction do
    # ...
  end
end
Freeaqingme commented 6 years ago

I'm not sure how common they are in usage but they are probably more common in tests.

I'm currently not in pursue of this functionality myself, but ran across this topic by chance, and realized it pertains to another use case I had a long time ago with Zend_Db (Zend Framework's database component) and mysql.

In my case, I had a bunch of application code inside of a model that started a transaction on the main database connection to do a bunch of queries. After those queries, it invoked a third-party library that also (attempted to) initiate(d) a transaction. After invocation of this library, the model was supposed to do a few more queries and finally commit.

This didn't work out. The 3rdparty lib, not aware of any pre-existing transaction, did the right thing by itself, but did happen to clash with the application code and this resulted in a 'transaction already started' error. For Zend_Db this was fixed by implementing nested transactions (not supported by Mysql) using savepoints (which are supported by mysql). Related ticket & implementation: https://zendframework.com/issues/browse/ZF-10736

I realize that this pertains to mysql, but I just wanted to demonstrate there's more scenario's than just tests where such functionality could come in handy.

tim-o commented 5 years ago

Zendesk ticket #3299 has been linked to this issue.

rkruze commented 5 years ago

Django also makes heavy use of nested transactions, especially in its test suite.

Bessonov commented 5 years ago

Is there any way to simulate error 40001?

knz commented 5 years ago

crdb_internal.force_retry()

(Use for testing only, interface may not be stable) -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Bessonov commented 5 years ago

@knz thank you very much!

Who found this topic and wonder how to execute it:

Basically, you issue

SELECT crdb_internal.force_retry('1000000ms':::INTERVAL);

query (found there: https://github.com/cockroachdb/cockroach/issues/16827). To do that, you must be the root. Related discussion: https://github.com/cockroachdb/cockroach/issues/29396

jordanlewis commented 4 years ago

@knz if this isn't completely fixed by your recent work, I would suggest closing it and replacing with a tighter issue explaining the remaining missing bits.

knz commented 4 years ago

Thanks for the reminder. I have reworded the issue title and the top-level issue description to reflect the last remaining link to this issue number in the source code. This is not any more an issue of SQL execution but rather one specific to the conn executor and the schema change logic. We can have a look at it and aim at closing this in the coming weeks/months.

knz commented 4 years ago

@lucy-zhang I understand that partial DDL rollback is not on the criticial path for now (@awoods187 can confirm perhaps) so this is low priority, but I think it needs to be roadmapped/prioritized for 21.1 or not too long after.

lessless commented 2 years ago

Hi guys! How's this coming along?

knz commented 2 years ago

cc @ajwerner for re-triage

ajwerner commented 2 years ago

We'll revisit this after we finish adopting the declarative schema changer. I don't expect progress on this in the year 2022.