cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.82k forks source link

roachtest: c2c/initialscan/kv0 failed #135793

Open cockroach-teamcity opened 5 days ago

cockroach-teamcity commented 5 days ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ 8eeb7f2ae3b2cede564b46ca47e2353fd147c061:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-44712

cockroach-teamcity commented 4 days ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 days ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ 5c5c9d6803d47848aa1960dd6642d5f2c1926814:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

msbutler commented 3 days ago

@dt this began regressing after https://github.com/cockroachdb/cockroach/pull/135637 landed. Perhaps we need to round robin spans after all.

dt commented 3 days ago

Perhaps we need to round robin spans after all.

This test has so few spans in it that round-robin vs first-k seems irrelevant; the procs all have just two or even only one span each.

I'm guessing the change in timing here is because I now take the dest node count into consideration when picking number of processors we'll run, dividing the number of spans by the number of nodes using integer division so we round down. If you have thousands of spans in some production scale case, the remainder when dividing by the node count is an inconsequential rounding error, but in this case, we have so few spans that that rounding error may be a non-trivial fraction (of a trivial number). Note that we now have 5 procs per node, and some have two spans while most have one. I'm guessing that before the rounding, we got 8 procs per node with exactly one span per proc, so for this extreme edge case -- one span per proc -- we doubled the work for some procs when we gave them a single extra span due to rounding changes.

I'll poke a bit and see if we want to make the division try harder to get all 8 procs even when the span count is tiny.

https://cockroachdb.github.io/distsqlplan/decode.html#eJysWE1v40YMvfdXBHP2wCSH5JA-BwVyaLvo7q0ICidWUrdZO7UUbAsh_72Qs-huPI6jDx8jKU9-T-R75LSh_vshLMKf25sFApEbioFodmASD7Ow2a6qn5efqzosfgsYZoHCLKQwCxyuZ-Fxt72t6nq76263-4evVv-EBczCevP41HSXr2fhdrurwqINzbp5qMIifGx21fLz1ea-qpv1dnO5bJZzCLOwqprl-mH_qg_LXbPubtaLMAsft0-72-qiw7_A2UX9uHy5Pv9UbZabZp7mn5Y3D9UcQec4j9wmNwYhMgdwMAGKhO4klMRJxADInt9GwDYnQs1ImQRAKRFEkaRO5J6UPCMqPYfr51nYPjXfuNbN8r4KC_xOnKvLsIDn2TR9sK8-9L4-0hoYU4IkmUFz5qSRsrEkEkYns5SFTujDbSlnBEYW7P43Zzch1v764FR9qK8-fFqfNnb8nBBNHEQwp-zGzP250FQu6Uxc5jhv0ZDRhCF7BnCFRJGRnNTByNiBjAX6s0tT2XFfdulddqllFXDUnNwMTCE6GmVLmlwFBAzermJsKVsWMsiZUZzYIWqSDCqE2T1ZZyP9peFX0uBwaeSMJtgaC3EipQwuntEpshmZEhml7Inc9ZQ6pRSxLKb-6shUdfSMFtgSmCMBZ0qZ3ESyRixN_2192iNqxnRYj_3l0any5PO5Rmwzg5OoIziDiThR1AxgoFkzG5m45gH08lR6dj56rC2imiKCgoB3nzFSQndwIlPIokMs0aZy8_NZIrelucfUOVwSRgMkFkE-UdfKJF3HOxqzcs5AMXPXLCSI7KCeEAdku7-Sh0bMPr07v48vHsn2aJZQsRvpEODd6bA1JEko4pYgk1NOGo90zIDxRydr1Lv9T7ljSxpZTlC3tlQqlmoMYJ4nM-_tDO_nQnuk0KOJqAJpdpWcWXK2AfxsMr_e7tBjHEzFfBSdMilwUu1Srwu0Aewm9zb1Xvze977YcrmDxCNJf6q3sVzzYjlKDdgG4E2JvinztNnuVtWuWr0S4_r5fRF_3G03zbrazelgQ6xfnrtcXHzb7E3dFFPOb_56Ovz1rxezNKJ8p232bfz6fejoV2tZIsIQv4HJhHqv4u-nUctazB3xSAAP4Df9g_VepXv4aWkuEcut6-1-5LacXaIW09sAfWiyPmdcz2Mr5blL7CIGBRAzolhS1iGBkyYTPOOGHtsjW0OU8vzp5GFTqUcsT_gGSMSTJeq9qR-TqH1N8Mh8EQccN6EMyeBfq_pxu6mrg6A5_io4eFXELpGq1X31El_1nuiH3fZ2_-zLn7_sgfYXVlXdvNxFefnrarO_h7P_8-nr2ej3UDgaCg-haDQUHUKl0VDpEIpHQ_EhlIyGkkMoHQ2lh1B5NFQ-hLLRUHYI5aOhvCjR8eWOhVo4oeALvXB8yWOhGI4veiw1G1_2VHrE6cInegUGp_1mfOVjCTa-9rGwLxxf_VgYGI6vfywsjCY0QGFiNKEBpIuku4ftl9_Xq7AIbH63khVHv72DyPmG482NUWSVO7rVxCvr5re7h-V93eXixz-2X_a4n_597FLtbvlQV7Pw0_Kv6rJqqt3n9WZdN-vbr3een3_4LwAA__-_QvVK

cockroach-teamcity commented 2 days ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ cea3ff5562160a3bf2802da052da2aaa40e1ccc1:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 day ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 17 hours ago

roachtest.c2c/initialscan/kv0 failed with artifacts on master @ f717f6bd218121bb5e3376af658545f6bff30c22:

(soon.go:60).SucceedsWithin: condition failed to evaluate within 30m0s: from cluster_to_cluster.go:1851: no replicated time
(monitor.go:149).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/c2c/initialscan/kv0/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

Same failure on other branches

- #136091 roachtest: c2c/initialscan/kv0 failed [A-disaster-recovery C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-24.3 release-blocker]

This test on roachdash | Improve this report!