TTL processing on big table fails with lexical error

ugavnholt commented 3 months ago

Describe the problem

Having a big table with billions rows, makes ttl jobs fail with lexical error: placeholder index must be between 1 and 65536"

Having a big table with billion of rows, where thousands of records are being added every second, TTL fails to process the table.

To Reproduce

CREATE TABLE public.accesslog (
  login STRING(255) NOT NULL,
  ip INET NOT NULL,
  "timestamp" TIMESTAMPTZ(6) NOT NULL,
  CONSTRAINT "primary" PRIMARY KEY (ip ASC, "timestamp" DESC),
  INDEX authpolicy_accesslog_login_idx (login ASC, "timestamp" DESC),
  INDEX authpolicy_accesslog_time_idx ("timestamp" ASC)
) WITH (ttl = 'on', ttl_expiration_expression = e'(("timestamp" AT TIME ZONE \'UTC\') + INTERVAL \'30 days\') AT TIME ZONE \'UTC\'', ttl_job_cron = '13 */4 * * *', ttl_select_batch_size = 5000000, ttl_delete_batch_size = 50000)

Fill the table with >1b rows, and add ~2k rows every second, and watch the ttl cleanup log.

Expected behavior Expects the TTL job to run, where TTL select loads the oldest 5.000.000 rows at each interval, deleting all selected records at batches of 50.000

Environment:

CockroachDB version: 24.1.0
Server OS: Ubuntu 22.04.4 LTS
Client app: n/a internal TTL job

Additional context Lowering the intervals/batch sizes make the select statement take unproportional resources from the cluster, raising the interval/batch sizes causes this error - hence using TTL to trim a big table seems broken.

Jira issue: CRDB-41739

blathers-crl[bot] commented 3 months ago

Hi @ugavnholt, please add branch-* labels to identify which branch(es) this C-bug affects.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl[bot] commented 3 months ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

fqazi commented 2 months ago

This a known limitation that needs the documentation to be updated, which can be worked around with a smaller batch size.

ugavnholt commented 2 months ago

@fqazi thank you for the answer, my own testing did reveal that smaller batch sizes worked. It seems like behind the scenes that the ttl batch delete is using one huge parameterized statement, and the number of parameters for that is limited. Some feedback about the ttl mechanics, is that there seem to be an opportunity for optimization here to handle big tables.

cockroachdb / cockroach

TTL processing on big table fails with lexical error #129858