Open rafiss opened 4 years ago
I'm sort of sold on the argument about Postgres compatibility in and of itself. But I'm still skeptical about the argument that this will provide with users with more flexibility (especially with tooling) in a way that's beneficial to them.
Schema migration tools sometimes start a transaction behind the scenes, before running the user's migration commands. Changing to a warning allows a user to ensure that the migration they are writing is in a transaction, without having to understand what their migration tool is doing behind the scenes.
I guess I just don't see how this works if extra COMMIT
s aren't also no-ops. It seems like if users end up with multiple BEGIN
s, then somewhere they'll also probably end up with multiple COMMIT
s, where the first one will commit the open transaction and the subsequent ones will return errors. So we'd still fail somewhere, but later instead of earlier, with a possibly unintended transaction already committed. Is this really considered a best practice for Postgres/Flyway users? I think seeing an example would help me.
Users won't have to rewrite their migration files as we change various tools to use or to not use transactions with CRDB.
The backward compatibility problem seems real, but also kind of separate, and I'm just not convinced that being less strict about BEGIN
is the way to fix that problem.
Lucy's point about COMMITs is the snag for me. The problem we'd create with the proposal here is that statements after the 1st COMMIT that would otherwise run atomically, will not be atomic any more, and succeed silently, and possibly introduce logical errors in the database without the app noticing until the final (erroneous) COMMIT.
@rafiss did you see this compatibility mismatch actually hurt a 3rd party app in practice? If so, can you provide more details about the scenario? Thanks
I guess I just don't see how this works if extra COMMITs aren't also no-ops.
Or the extra BEGIN becomes a SAVEPOINT and the next COMMIT becomes a RELEASE SAVEPOINT. That's what MS SQL server does in this case. But it wouldn't be compatible with postgres; there may not be a COMMIT statement to match each BEGIN.
I think the best approach is for BEGIN within an existing transaction to be an error; allowing mismatched BEGIN/COMMIT/ROLLBACK is asking for trouble. But in partial defense of the postgres behavior, transaction management in the SQL standard is a mess. Transactions aren't required to be surrounded by matching BEGIN/COMMIT statements; the START TRANSACTION statement is optional and any statement implicitly begins a transaction. Many databases have an "autocommit" mode, and IIRC in very old versions of mysql there wasn't a BEGIN statement - you began a transaction with SET autocommit=0
(and this was often repeated instead of being paired 1:1 with COMMITs). In general there seemed to be more concern about the risk of data not getting committed when you thought it did than ensuring that the atomic units of commit are what you expect (for example, mysql will implicitly commit the pending transaction in certain statements). So the postgres behavior makes sense in a world in which BEGIN is shorthand for SET autocommit=0
, but I'm not sure that was ever true for postgres.
I've seen real-world bugs in applications using mysql (one memorable example involved the difference between python's Exception
and BaseException
, and is why the common advice to never use a bare except:
is wrong) in which dangling data from a transaction that should have been rolled back got erroneously committed as a part of the next transaction. That would have been prevented if BEGIN
failed while there was an open transaction.
did you see this compatibility mismatch actually hurt a 3rd party app in practice?
This issue was prompted by some testing @vy-ton has been doing with different schema migration tools. So I don't think anyone encountered this "in the wild" but she wanted us to explore if we could match Postgres semantics here, with the goal of minimizing overhead as we add support for different migration tools.
Kind of related, here is an internal example where we rely on COMMIT/BEGIN inside of a migration: https://github.com/cockroachlabs/managed-service/blob/master/console/migrations/README.md#migration-file-contents If CRDB were to ignore BEGIN inside of a transaction, then the person writing the migration wouldn't need to know how the migration tool works behind the scenes, and wouldn't need that initial COMMIT. (I do see the argument for why it is in fact better to make people understand what's going on explicitly, but I'm just presenting the case for the issue.)
then the person writing the migration wouldn't need to know how the migration tool works behind the scenes
But you'd still need to know how the tool works to know that it will COMMIT a transaction that you leave open. And this gets into the transactional schema change question - we want the user-supplied DDL statement to be run transactionally with updates to the migration status table, which implies that the migration must not commit itself.
Hi , I was looking into Cockroach's compatibility with DBT. ( www.getdbt.com ) DBT Supports Postgres and is essentially a really decent Python ELT engine.
When i try to run a dbt model i get this error.
Is this something that is likely to be resolved or would we need to create a new DBT adapter for Cockraochdb that does things differently ?
We don't have plans to change this any time soon. But even if we do change it, our experience with other tools has shown that there are likely to be other issues that come up. So the best approach is to create a cockroachdb adapter.
If you do that, please feel free to add DBT support to our community-supported tools page: https://github.com/cockroachdb/docs/blob/master/v21.1/community-tooling.md
Thanks rafiss.
I looked into the other issues that may come up with the postgres adapter, and some of them i'm comfortable in resolving myself by modifying the existing postgres adapter, some issues like this one, im not sure where to start looking. I have a developer coming onboard shortly so i'll discuss it with them. I think the logic we need to make an adapter work may exists in one of the other adaptors that has modified the way it uses transactions.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Currently sending a
BEGIN
statement when inside of a transaction results in an error:This has been the case from very early on. It was discussed and added in this PR: #2411
In Postgres, this just results in a warning.
I'm opening this to track discussion on changing this to a warning in CockroachDB.
Reasons I am in favor:
cc @lucy-zhang @vy-ton
Jira issue: CRDB-3702