cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

roachtest: schemachange/random-load failed #129961

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 2 months ago

roachtest.schemachange/random-load failed with artifacts on release-23.2 @ 547d7ea96aca72ba7f68616e308b1981d6dac76c:

(test_runner.go:1153).runTest: test timed out (3h0m0s)
test artifacts and logs in: /artifacts/schemachange/random-load/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #129857 roachtest: schemachange/random-load failed [relation "[1285]" does not exist] [C-test-failure O-roachtest O-robot T-sql-foundations branch-release-24.2.1-rc] - #129287 roachtest: schemachange/random-load failed [current transaction is aborted, commands ignored until end of transaction] [C-test-failure O-roachtest O-robot P-2 T-sql-foundations branch-master] - #127513 roachtest: schemachange/random-load failed [current transaction is aborted, commands ignored until end of transaction block] [C-test-failure O-roachtest O-robot P-3 T-sql-foundations branch-release-24.1]

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-41803

annrpom commented 1 month ago

Here is what I have found so far:

teamcity-16695580-1725083607-64-n3cpu4-geo-0001> I240831 11:49:14.088226 4565 sql/schema_change_plan_node.go:227 ⋮ [T1,Vsystem,n1,client=10.142.0.45:45302,hostssl,user=‹×›] 32631 schema change waiting for 1 concurrent schema change job(s) [999502477104316417] on descriptor 324, waited 2h54m17.090675781s so far

324 is table1757

fqazi commented 1 month ago

So the job infinitely retrying with the error:

999502477104316417  NEW SCHEMA CHANGE   DROP VIEW IF EXISTS schemachange.public.view3573    DROP VIEW IF EXISTS schemachange.public.view3573    roachprod   {324,445}   running PostCommitNonRevertiblePhase stage 1 of 1 with 1 MutationType op pending    2024-08-31 08:52:28.362333+00   2024-08-31 08:52:30.042865+00   NULL    2024-08-31 10:57:37.213573+00   0   NULL        2   5265350267927663133 2024-08-31 10:57:37.564079+00   2024-08-31 13:05:07.564079+00   8   "{""running execution from '2024-08-31 09:22:04.211077' to '2024-08-31 09:22:04.776864' on 3 failed: non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found"",""running execution from '2024-08-31 09:54:05.718187' to '2024-08-31 09:54:06.429393' on 2 failed: non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found"",""running execution from '2024-08-31 10:57:37.564079' to '2024-08-31 10:57:38.129685' on 3 failed: non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found""}"  "[{""executionEndMicros"": ""1725096124776864"", ""executionStartMicros"": ""1725096124211077"", ""instanceId"": 3, ""status"": ""running"", ""truncatedError"": ""non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found""}, {""executionEndMicros"": ""1725098046429393"", ""executionStartMicros"": ""1725098045718187"", ""instanceId"": 2, ""status"": ""running"", ""truncatedError"": ""non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found""}, {""executionEndMicros"": ""1725101858129685"", ""executionStartMicros"": ""1725101857564079"", ""instanceId"": 3, ""status"": ""running"", ""truncatedError"": ""non-cancelable: failed to read descriptors [324 445] for the declarative schema change state: referenced descriptor ID 445: looking up ID 445: descriptor not found""}]"

@rafiss Will https://github.com/cockroachdb/cockroach/pull/129342 help with this one? Should we be backport it to the 23.2 branch or is not worth the risk

rafiss commented 1 month ago

I think a backport should be safe for this: https://github.com/cockroachdb/cockroach/pull/130698