cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

Support Parallel "Batch" UPDATE / DELETE jobs #86002

Open glennfawcett opened 2 years ago

glennfawcett commented 2 years ago

Is your feature request related to a problem? Please describe.

I have scenario come up frequently on various clusters. You have a table that has billions of rows and you need to update say 100 million rows. There is no way to easily do this with Cockroach DB. You can't really do this as one statement and need to create some code to batch up the delete.

It would be real useful if this sort of activity was managed internally by Cockroach DB using a syntax such as:

UPDATE activity_log 
SET is_active=false
WHERE country_id = <$1>
USING BATCHES OF 1000;

... or ...

DELETE FROM activity_log 
WHERE is_active=false
USING BATCHES OF 1000;

This would just run in batches as a job and not have to be transactionally consistent across the whole operation. This would allow developers and database operations to not have to manually script these operations.

Describe the solution you'd like Have the database handle these operations as a batch job.

Describe alternatives you've considered I have created a description of this for people troubleshooting this behavior. Additionally, I have created a sample trimmer tool to do parallel bulk deletion of data. This tool parallelizes the DELETE into threads defined by the histogram statistics for the table.

Jira issue: CRDB-18517

rytaft commented 2 years ago

cc @vy-ton for prioritization