cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

roachtest: slow-drain/duration=1m0s failed #131084

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.slow-drain/duration=1m0s failed with artifacts on release-24.1 @ 58f50a3c9399f11bb58d7aa42b1f48c7956134f0:

(cluster.go:2303).Start: ~ COCKROACH_CONNECT_TIMEOUT=1200 ./cockroach sql --url 'postgres://root@localhost:29000?options=-ccluster%3Dsystem&sslcert=.%2Fcerts%2Fclient.root.crt&sslkey=.%2Fcerts%2Fclient.root.key&sslmode=verify-full&sslrootcert=.%2Fcerts%2Fca.crt' -e "CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-16965725-1726811220-129-n6cpu4/system/1726831461289776163?AUTH=implicit' RECURRING '*/15 * * * *' FULL BACKUP '@hourly' WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: unexpected error occurred when checking for existing backups in gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-16965725-1726811220-129-n6cpu4/system/1726831461289776163?AUTH=implicit: unable to list files in gcs bucket: Get "https://storage.googleapis.com/storage/v1/b/cockroachdb-backup-testing/o?alt=json&delimiter=&endOffset=&includeTrailingDelimiter=false&pageToken=&prefix=roachprod-scheduled-backups%2Fteamcity-16965725-1726811220-129-n6cpu4%2Fsystem%2F1726831461289776163%2Fmetadata%2Flatest&prettyPrint=false&projection=full&startOffset=&versions=false": compute: Received 429 ``"Too many requests."``
SQLSTATE: 58030
Failed running "sql": COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/slow-drain/duration=1m0s/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-42363

tbg commented 1 month ago

This failed here, and thus has nothing to do with the test:

/pkg/roachprod/install/cockroach.go#L1444-L1457

    url := c.NodeURL("localhost", port, startOpts.VirtualClusterName, serviceMode, AuthRootCert)
    fullCmd := fmt.Sprintf(`COCKROACH_CONNECT_TIMEOUT=%d %s sql --url %s -e %q`,
        startSQLTimeout, binary, url, createScheduleCmd)
    // Instead of using `c.ExecSQL()`, use `c.runCmdOnSingleNode()`, which allows us to
    // 1) prefix the schedule backup cmd with COCKROACH_CONNECT_TIMEOUT.
    // 2) run the command against the first node in the cluster target.
    res, err := c.runCmdOnSingleNode(ctx, l, node, fullCmd, defaultCmdOpts("init-backup-schedule"))
    if err != nil || res.Err != nil {
        out := ""
        if res != nil {
            out = res.CombinedOut
        }
        return errors.Wrapf(errors.CombineErrors(err, res.Err), "~ %s\n%s", fullCmd, out)
    }

I brought up the idea of having errors like these filed separately here.