cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

roachtest: copyfrom: command is too large #121413

Closed cockroach-teamcity closed 6 months ago

cockroach-teamcity commented 7 months ago

roachtest.copyfrom/crdb-nonatomic/sf=1/nodes=1 failed with artifacts on release-24.1 @ 5d952f80b3e1efe2e9aaed73f1fd68433880fcb7:

(copyfrom.go:101).runTest: COMMAND_PROBLEM: exit status 1
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/copyfrom/crdb-nonatomic/sf=1/nodes=1/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-37235

yuzefovich commented 7 months ago
ERROR:  command is too large: 67901396 bytes (max: 67108864)

We might need to adjust the test a bit.

DrewKimball commented 7 months ago

Related to https://github.com/cockroachdb/cockroach/issues/117070

DrewKimball commented 7 months ago

Marking this as p-3, since it's an issue with our testing.

yuzefovich commented 6 months ago

It's interesting to note that in the last 3 failures (earlier ones no longer have artifacts) we have this right before the error:

W240504 06:29:19.370716 3372 kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:731 ⋮ [T1,Vsystem,n1,client=10.142.0.18:51820,hostssl,user=‹importer›] 266  a transaction has hit the intent tracking limit (kv.transaction.max_intents_bytes); is it a bulk operation? Intent cleanup will be slower. txn: "unnamed" meta={id=99bc6814 key=/Table/104/1/‹96192›/‹2›/‹0› iso=Serializable pri=0.00689756 epo=0 ts=1714804155.827999065,0 min=1714804155.827999065,0 seq=0} lock=true stat=PENDING rts=1714804155.827999065,0 wto=false gul=1714804156.327999065,0 ba: ‹31924 CPut, 1 EndTxn, 255392 InitPut›
E240504 06:29:19.380100 3372 9@sql/conn_executor.go:3097 ⋮ [T1,Vsystem,n1,client=10.142.0.18:51820,hostssl,user=‹importer›] 267  error executing ‹CopyIn: COPY lineitem FROM STDIN WITH (FORMAT CSV, DELIMITER '|')›: command is too large: 67901396 bytes (max: 67108864)

I wonder whether hitting this intent tracking limit somehow makes the raft command larger.

yuzefovich commented 6 months ago

I don't understand why the failure would be non-deterministic, but I think the main problem is that our estimate of using MaxCommandSize / 3 is too inflexible and too aggressive for TPCH lineitem table because it has 8 secondary indexes, so for each input row we produce 9 KV operations. I'll send a patch to make the fraction depend on the number of indexes in the table.

blathers-crl[bot] commented 1 month ago

Based on the specified backports for linked PR #124637, I applied the following new label(s) to this issue: branch-release-23.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.