cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.96k stars 3.79k forks source link

sql: support causality tokens #7945

Open tbg opened 8 years ago

tbg commented 8 years ago

My apologies if this exists somewhere - wasn't able to find it.

We already support reading the causality token of a transaction via cluster_logical_timestamp, but we provide no way for a causally downstream transaction to ingest that token (i.e. making sure that its assigned timestamp is at least that represented by the token).

When implementing this, we must validate the token. This is a tricky problem when clients are not trusted - the token will make it into the hybrid logical clocks, so supplying a bogus high value could easily trigger our clock offset mechanisms (or even worse, not trigger them and lead to stale reads). The baseline is making sure we're not MaxOffset ahead of the current HLC timestamp, but that's not enough. Perhaps causality tokens which can be passed in that way ought to be certified by the node certs and be totally opaque to the client.

Jira issue: CRDB-6165

knz commented 6 years ago

Discussed today extensively with @bdarnell and @andreimatei

@andreimatei can you flesh out your thoughts here?

knz commented 6 years ago

(For context: we're having this discussion again because there were multiple community questions around this area recently)

ajwerner commented 4 years ago

When implementing this, we must validate the token. This is a tricky problem when clients are not trusted - the token will make it into the hybrid logical clocks, so supplying a bogus high value could easily trigger our clock offset mechanisms (or even worse, not trigger them and lead to stale reads).

One way we could mitigate the risk of the client causality token pushing the clock too far into the future would be to force the client to wait until the wall time of timestamp to which it wants to push the HLC is within max offset of the wall clock on the gateway machine. If a client tries to push the clock an hour ahead, it's just going to sleep for an hour. We don't expect the causality token to be far in the future.


As for different interfaces, there's been some offline discussions. One way would be to provide a sql interface. Without driver support anything we do here is likely to be clunky.

Imagine a function commit_with_timestamp() which implicitly commits the transaction and returns the commit timestamp. The only valid statement after issuing a select from this function would be to call commit. The connExecutor would be moved to a committed state. These are exactly the semantics of RELEASE SAVEPOINT today (see docs).

In this proposal a client could obtain a causality token as follows:

BEGIN;
...
SELECT commit_timestamp(); -- gives back a DECIMAL hlc timestamp
COMMIT

As for the client using it, perhaps we add something like:

SELECT crdb_wait($1);

Which will sleep until the timestamp is at least close to the wall clock and then bump the gateway HLC.

Drivers could help make the use of the causality token seamless by injecting queries on behalf of the user. I suspect that the reading of the timestamp and properly passing it to the client driver would be the biggest concern. Perhaps the driver can translate Commit() directly to this sequence. The driver may also inject the waiting command prior to every statement while not in an open transaction.

The causality token will need to be ratcheted across all connections in the connection pool. Commit in the driver shouldn't return to the caller until the causality token has been propagated.

Another way the token could be transferred to a client would be through a NoticeResponse in the pg wire protocol indicating the commit timestamp. We could use a connection setting to determine whether the data is sent. I haven't been able to come up with a better protocol extension to provide the causality token but I suspect there is one. Perhaps we could have a pre-defined bound statement so that each connection doesn't need to prepare it? On the whole a protocol-layer extension would likely lead to less confusing documentation and user experiences when it ultimately is adopted. The downside is that it'll be impossible to adopt it above an existing driver.

Another thing which seems like it could be a problem re: causality tracking is https://github.com/cockroachdb/cockroach/issues/7526 where causality may not propagate properly across distsl during distributed execution.

knz commented 4 years ago

One way we could mitigate the risk of the client causality token pushing the clock too far into the future would be to force the client to wait until the wall time of timestamp

This has also been discussed before. In fact we even have recommended users to do so using existing SQL features.

Say your previous sampled value is X, with the same unit (nanos+logical) as cluster_logical_timestamp, let's say value X, then you can wait for your causality-sufficient delay using:

SELECT pg_sleep(X::INT::FLOAT / 1e9 - clock_timestamp()::FLOAT + <maxoffset in secs>)

(the timestamp -> float/decimal conversion gives fractional seconds; X::INT gets rid of the logical part, and Y::FLOAT/1e9 moves the decimal point to the right place; finally, pg_sleep can wait for fractional seconds up to microsecond precision)

ISTM we should provide syntactic sugar for this formula but maybe we don't need much more!

As for the client using it, perhaps we add something like: SELECT crdb_wait($1);

Good idea, I like that! Good syntactic sugar for the above.

As for different interfaces, [...] Without driver support anything we do here is likely to be clunky.

So here there's some data points missing:

1) it's likely that causality tokens are super important very soon, much earlier than we are able to provide drivers (and/or earlier than our users are able to adopt them). So we have a large incentive to find a solution that does not require custom drivers.

2) I disagree that it's going to be clunky.

We already provide "interstitial statements" that can be issued by clients outside of the SQL transaction state machine: SHOW SYNTAX. We can envision a SHOW TRANSACTION CAUSALITY TOKEN that also can be issued as such a freebie, just after a COMMIT, and displays the token of the last COMMIT. Or it could be used just after the RELEASE SAVEPOINT cockroach_restart and before the final COMMIT, in case the client framework code does not support issuing statement ouside of a BEGIN...COMMIT block.

Alternatively, we can store the token of the last txn in a session var so that the client can use SHOW last_txn_token to display the token of the last fully committed txn.

However Ben pointed out that we should not rely on syntax that must be used outside of a begin..commit block because clients often use a connection pool and may not control which connection is used for the statement just after a COMMIT. Hence a big focus on something in-between the RELEASE SAVEPOINT cockroach_restart and before the final COMMIT.

3) It's actually supper important to have a separate statement to sample the token so as not to use cluster_logical_statement() itself, at least as long as that function blocks txn refreshes.

bdarnell commented 4 years ago

it's likely that causality tokens are super important very soon, much earlier than we are able to provide drivers (and/or earlier than our users are able to adopt them). So we have a large incentive to find a solution that does not require custom drivers.

Why do you think this? I feel strongly in the opposite direction: I don't believe we can make causality tokens usable enough to be relevant without custom drivers.

However Ben pointed out that we should not rely on syntax that must be used outside of a begin..commit block because clients often use a connection pool and may not control which connection is used for the statement just after a COMMIT.

This is one way being in the driver can be important. The driver can do a COMMIT followed by a SHOW last_txn_token even if the application can't (this pattern is in fact common for mysql which lacks INSERT RETURNING). I'd rather design the syntax with the assumption that it will be implemented by drivers than introduce more RELEASE SAVEPOINT cockroach_restart style hacks that appear to get the job done without driver support but are very tricky to use in practice.

github-actions[bot] commented 3 years ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

knz commented 3 years ago

Still relevant, and we even discussed it on slack in the past few weeks. (internal link)

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

knz commented 11 months ago

still relevant

brianbraunstein commented 7 months ago

Any progress or update here?