cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.14k stars 3.81k forks source link

sql: Transaction status API to resolve ambiguous errors #35170

Open bdarnell opened 5 years ago

bdarnell commented 5 years ago

Ambiguous errors (which could be generated internally to CRDB or a result of network failure between the gateway and client) are difficult to handle. Either the application takes care to make its transactions idempotent so they can be retried in ambiguous situations, or builds its own status mechanism that can be queried, or it just bubbles the error up to the end user (in practice, nearly always the last).

Ambiguous errors are possible even in non-distributed databases (due to network failures between the client and server), and PostgreSQL 10 introduced a transaction status API (sometimes called "transaction traceability" in discussions of the feature) that can be used to resolve the ambiguity. A function txid_current() can be called during the transaction before committing, and if the commit fails ambiguously the txn id can be passed to txid_status() in another session (as long as it is "recent enough"; the details of this retention period are not documented as far as I've seen). txid_status() returns a string, one of "committed", "aborted", "in progress" or null if the transaction record is too old.

One obstacle to us adopting a postgres-compatible API is that the txid apis are defined in terms of 64-bit integer IDs, while our IDs are 128 bits (and transactions are not addressable by ID alone; an anchor key is necessary as well). More fundamentally, we try to clean up any residue left behind by completed transactions as quickly as possible, so the transaction record wouldn't be there even if we introduced a postgres-incompatible API that included full address info.

I don't think ambiguous errors are currently causing a lot of problems so this isn't a priority, but I'm filing this as a placeholder to gauge interest. It doesn't look like postgres drivers have built in automatic disambiguation support using this feature yet; if this happens it would increase the value of being postgres-compatible instead of introducing our own interface with larger IDs (I wonder if there's a chance to influence the postgres API before this gets too entrenched).

Jira issue: CRDB-4599

tbg commented 5 years ago

There's a long-term plan for parallel commits (I think mostly committed in @nvanbenschoten's brain right now) that requires txnids to be embedded in committed values. If we had that, the GC problems would be less immediate. Querying the transaction record would be replaced by querying any write in the transaction (which, if found, proves the txn committed and if absent and above gc threshold, can be prevented to prove not committed).

andreimatei commented 3 years ago

A notable case of ambiguous error we need to keep in mind here are the QueryCanceledError and QueryTimedOutError, which get returned to the client with the QueryCanceled pgcode.

andreimatei commented 3 years ago

This issue should be linked to #26915, which talks about embedding txn ids in committed values (as @tbg says above).

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!