cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

sql: Improve observability of paused TTL deletion jobs #80122

Open vy-ton opened 2 years ago

vy-ton commented 2 years ago

To pause TTL deletion of expired rows, a use can execute alter table promo_codes set (ttl_pause = true); or SET CLUSTER SETTING sql.ttl.job.enabled = false

When TTL deletion is paused, this is the output of SHOW SCHEDULES

demo@127.0.0.1:26257/movr> show schedules;
          id         |        label         | schedule_status |        next_run        |   state   | recurrence | jobsrunning | owner |            created            |     command
---------------------+----------------------+-----------------+------------------------+-----------+------------+-------------+-------+-------------------------------+-------------------
  752985581533265921 | sql-stats-compaction | ACTIVE          | 2022-04-18 16:00:00+00 | succeeded | @hourly    |           0 | node  | 2022-04-13 15:22:43.110947+00 | {}
  752990315300323329 | row-level-ttl-110    | ACTIVE          | 2022-04-18 16:00:00+00 | NULL      | @hourly    |           0 | demo  | 2022-04-13 15:46:47.744078+00 | {"tableId": 110}
  753034040297488385 | row-level-ttl-112    | ACTIVE          | 2022-04-18 16:00:00+00 | NULL      | @hourly    |           0 | demo  | 2022-04-13 19:29:11.554418+00 | {"tableId": 112}
(3 rows)

Desired behavior

Epic CRDB-18322 Jira issue: CRDB-15839

otan commented 2 years ago

When TTL deletion is paused, schedule_status should be PAUSED similar to backup scheduled.

Should be "easy" - we should call the code to PAUSE/UNPAUSE the schedule here:

https://github.com/cockroachdb/cockroach/blob/ecb24ef19f49517ab9a0a5c3fdb8171af6f9b459/pkg/sql/paramparse/paramobserver.go#L397

The challenge is if ttl_pause and a ttl is set at the same time, but perhaps we could block that interaction.

The command column should show the select/delete statements executed by the deletion job

i think this is a misunderstanding of "command" column - it's meta arguments for the scheduled job to start up a job and doesn't show anything to do with functionality of it. We shouldn't do this.

We should consider supporting the PAUSE SCHEDULES syntax as a proxy for setting ALTER TABLE...SET (TTL_PAUSE = TRUE)

I would argue we should only support the latter, but not too fussed.


i'm thinking this is lower pri, but lmk if you think otherwise.

vy-ton commented 2 years ago

The command column should show the select/delete statements executed by the deletion job i think this is a misunderstanding of "command" column - it's meta arguments for the scheduled job to start up a job and doesn't show anything to do with functionality of it. We shouldn't do this.

docs for scheduled backups seem to show the user functionality.

I'm ok with this being lower priority and bundled with the full support Epic

rafiss commented 2 years ago

re: the command column, in the case of scheduled backups, it shows the command used to start the backup job.

but the request here is to show the selects/deletes that the TTL job is running. i'd agree with oliver that this isn't appropriate to show in SHOW SCHEDULES output. also, not even sure what we would show, since the TTL job can run dozens/hundreds of queries depending on how much data it has to delete, and we won't know what those are until after the job is already running. (those queries aren't known at the time of setting up the schedule)

rafiss commented 2 years ago

When TTL deletion is paused, schedule_status should be PAUSED similar to backup scheduled.

Another challenge with this is if RESUME SCHEDULE ... is later called on the TTL job then we'd have to also do something to set ttl_pause = 'false' for that table.

rafiss commented 2 years ago

We should consider supporting the PAUSE SCHEDULES syntax as a proxy for setting ALTER TABLE...SET (TTL_PAUSE = TRUE)

A user already can use PAUSE SCHEDULE to pause any TTL schedule.

So that means that we currently have two different ways of pausing a TTL job: ALTER TABLE ... SET (ttl_pause = 'true') and PAUSE SCHEDULE .... Currently one does not affect the other. If you use PAUSE SCHEDULE, that does cause it to show up as paused in SHOW SCHEDULES as you'd expect.

If we did make them start affecting each other, I think it might even be more confusing...

rafiss commented 2 years ago

Another approach could be to set the ttl_cron to "never" (if that's allowed), and get rid of the ttl_pause option.