cockroachdb / docs

CockroachDB user documentation
https://cockroachlabs.com/docs
Creative Commons Attribution 4.0 International
188 stars 458 forks source link

Clarify "preserving DELETE performance over time" docs #12230

Open rmloveland opened 2 years ago

rmloveland commented 2 years ago

Richard Loveland (rmloveland) commented:

In the DELETE docs we have a section on preserving DELETE performance over time which needs to be updated. It provides the user with the option to take

one of the following approaches:

  • At each iteration, update the WHERE clause to filter only the rows that have not yet been marked for deletion. For an example, see Batch-delete on an indexed filter.
  • At each iteration, first use a SELECT statement to return primary key values on rows that are not yet deleted. Rows marked for deletion will not be returned. Then, use a nested DELETE loop over a smaller batch size, filtering on the primary key values. For an example, see Batch delete on a non-indexed column.
  • To iteratively delete rows in constant time, using a simple DELETE loop, you can alter your zone configuration and change gc.ttlseconds to a low value like 5 minutes (i.e., 300), and then run your DELETE statement once per GC interval.

However, we have users running into this problem via a support issue who have apparently tried (one of? all of?) these approaches and not found satisfaction.

Therefore we need to update this page to make it clearer. In particular, given that this is still a known behavior of CockroachDB when scanning over tombstones according to https://github.com/cockroachdb/cockroach/issues/17229, perhaps none of these actions will actually help? Also it's possible that providing the "choose your own adventure" of three choices is less helpful than it could be, since users will end up having to try all three.

Estimated scope of work:

See also:

Jira Issue: DOC-1128

rmloveland commented 2 years ago

@mwang1026 @awoods187 FYI - the behavior is storage, but the docs are SQL. Lemme know which product area this should be in (I added both labels for now)

mwang1026 commented 2 years ago

@mwang1026 @awoods187 FYI - the behavior is storage, but the docs are SQL. Lemme know which product area this should be in (I added both labels for now)

Is this question more about IA or about who writes the doc?

rmloveland commented 2 years ago

Not IA, less about who writes it and more about who (of you all) will be prioritizing it.

If it's Storage, it might (?) get done sooner.

If it goes on the SQL docs mega-backlog, maybe less soon. (But that's just guessing/speculation)

awoods187 commented 2 years ago

This looks like @vy-ton to me (consistent with her TTL/Bulk Deletes research). Vy do you agree?

vy-ton commented 2 years ago

+1, let's put this on SQL docs backlog

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB docs!