cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.13k stars 3.81k forks source link

sql,catalog/lease: long-running statements hold leases after their expiration #64297

Open ajwerner opened 3 years ago

ajwerner commented 3 years ago

Is your feature request related to a problem? Please describe.

Transactions acquire leases in order to used cached version of schema elements. The lease.Manager keeps track of reference counts on lease. When the lease is acquired, its present expiration is used to bound the commit timestamp for the transaction. In #63725, we update the deadline before each statement execution. It's fine and good for the statement to hold onto the lease it's using for as long as it is using it. However, if the expiration the statement is using is passed the present time or the statement cannot get pushed then holding on to a reference to the descriptor seems silly.

Let's look at an example. Say we're running stats collection. That ends up being a big query statement. It acquires leases and then executes. It's special because it occurs at a fixed timestamp but we'll come back to that. So, a statement, while running, will never see its deadline extended. Let's imagine

t1: Statement execution begins, deadline set to t10, lease refCount incremented to 1
t5: lease renewed, new expiration at t15, refCount still 1
...
t50: lease renewed, new expiration t60, refCount still 1
t51: schema change occurs, lease manager notices but does not drop lease because refCount is 1
t60: schema change finishes.

The above scenario is painful because even though the refCount is 1, that user of the lease cannot take advantage of the renewals which happened during its execution. It's even more painful because a fixed timestamp transaction like that used when creating statistics cannot be pushed; it will never use a timestamp later than t1. It's crazy that such statements thus hold off the completion of the schema change between t51 and t60.

Describe the solution you'd like

We should nuance our reference tracking model to refer to lease instances with their given timestamps rather than to just the lease itself. Alternatively, instead of a single reference, we could keep some sort of association between the lease and a timestamp at which that lease is being used as part of a deadline.

I think in the fixed timestamp case, the transaction can immediately drop its reference to the lease. I say this because the timestamp at which the lease ends up getting dropped will have to be after the fixed timestamp in question. A transaction modifying that descriptor might well commit before the timestamp of this read-only transaction, but that's fine.

Additional context

Perhaps this should be two issues, one to immediately drop references to leases in fixed timestamp queries and one to deal with long-running statements.

This issue was motivated by a long-running create stats job. Another thing we could do there is inject a descs collection which has read the descriptors transactionally into the internal executor or we could find a way to just tell the internal executor to avoid caches.

This issue is related to a soon to be penned issue on extending the lease tracking state to understand whether a leaseholder is executing or not.

Jira issue: CRDB-6993

postamar commented 2 years ago

Dumb question but in a time-travelling query, why do we even need to acquire leases in the first place? I'm probably missing something obvious.

ajwerner commented 2 years ago

I think there's two different issues here (see Additional Context). For read-only, fixed timestamp queries, we just shouldn't acquire leases. Fixing that will give us the biggest bang-for-the buck.

postamar commented 2 years ago

Right. Let's do that.