Closed cockroach-teamcity closed 3 years ago
This was at least in part clock uncertainty:
Error: pq: remote wall time is too far ahead (904.294361ms) to be trustworthy
Error: COMMAND_PROBLEM: exit status 1
(1) COMMAND_PROBLEM
Wraps: (2) Node 4. Command with error:
| ```
| ./workload run bank {pgurl:1} --max-rate=10
| ```
Wraps: (3) exit status 1
Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
run_060208.084_n4_workload_run_bank: 06:10:45 cluster.go:2337: > result: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2859839-1617861430-04-n4cpu4:4 -- ./workload run bank {pgurl:1} --max-rate=10 returned: exit status 20
However, I think this potentially revealed some behavior we likely want to understand. Namely, when node 2 panic'd because of clock sync, the schemafeed also died on node 1:
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 CHANGEFEED job ‹×› returning with error: fetching changes for ‹×›: ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +(1) tags: [n1,received-error]
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +Wraps: (2) tags: [n‹×›,sent-error=‹×›]
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +Wraps: (3)
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | (opaque error wrapper)
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | reportable 0:
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + |
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).fetchDescriptorVersions
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:569
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).updateTableHistory
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:297
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).pollTableHistory
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:280
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).Run
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:242
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | golang.org/x/sync/errgroup.(*Group).Go.func1
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | runtime.goexit
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 + | /usr/local/go/src/runtime/asm_amd64.s:1374
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +Wraps: (4) fetching changes for ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +Wraps: (5) ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> W210408 06:10:45.183551 7949 ccl/changefeedccl/changefeed_stmt.go:609 ⋮ [n1,job=‹×›] 677 +Error types: (1) *contexttags.withContext (2) *contexttags.withContext (3) *errbase.opaqueWrapper (4) *errutil.withPrefix (5) *errors.errorString
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 ⋮ [n1] 678 CHANGEFEED job ‹×›: stepping through state reverting with error: fetching changes for ‹×›: ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 +(1) tags: [n1,received-error]
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 ⋮ [n1] 678 +Wraps: (2) tags: [n‹×›,sent-error=‹×›]
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 +Wraps: (3)
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | (opaque error wrapper)
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | reportable 0:
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + |
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).fetchDescriptorVersions
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:569
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).updateTableHistory
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:297
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).pollTableHistory
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:280
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed.(*SchemaFeed).Run
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/schemafeed/schema_feed.go:242
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:166
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | golang.org/x/sync/errgroup.(*Group).Go.func1
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | runtime.goexit
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 + | /usr/local/go/src/runtime/asm_amd64.s:1374
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 ⋮ [n1] 678 +Wraps: (4) fetching changes for ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 ⋮ [n1] 678 +Wraps: (5) ‹×›
teamcity-2859839-1617861430-04-n4cpu4-0001> I210408 06:10:45.183767 7949 jobs/registry.go:1175 â‹® [n1] 678 +Error types: (1) *contexttags.withContext (2) *contexttags.withContext (3) *errbase.opaqueWrapper (4) *errutil.withPrefix (5) *errors.errorString
teamcity-2859839-1617861430-04-n4cpu4-0001> E210408 06:10:45.185192 7949 jobs/adopt.go:260 ⋮ [n1] 679 job ‹×›: adoption completed with error job ‹×›: could not mark as reverting: fetching changes for ‹×›: ‹×›: ‹×›: ‹×›
This appears to be a non-retriable error. But, ideally changefeeds, like the rest of the cluster 3 node cluster, should be able to survive a ~two~ one node failure.
Do you mean one node failure? If two out of three nodes died, I'd expect the job to fail, or at least never finish.
Do you mean one node failure?
I did! Thanks :D.
I opened https://github.com/cockroachdb/cockroach/issues/63317 to cover the other issue that was found here. Given what I believe the cause is, this is existing behaviour, I don't think it needs to be a release blocker but I'll consult with the rest of the team.
roachtest.cdc/bank failed with artifacts on master @ 704a9fa4ffd144a2bc977bc1a6853a853b17e70e:
Reproduce
To reproduce, try: ```bash #!/usr/bin/env bash #!/bin/bash set -euxo pipefail # NB: invoke this script with "caffeinate" on OSX and/or linux to # prevent runs failing due to standby. sha=$(git rev-parse HEAD) if [ ! -f roachtest.$sha ]; then ./build/builder.sh mkrelease amd64-linux-gnu bin/{roach{prod,test},workload} mv -f bin.docker_amd64/roachprod roachprod.$sha mv -f bin.docker_amd64/workload workload.$sha mv -f bin.docker_amd64/roachtest roachtest.$sha fi if [ ! -f cockroach.$sha ]; then ./build/builder.sh mkrelease amd64-linux-gnu mv cockroach-linux-2.6.32-gnu-amd64 cockroach.$sha fi # NB: consider adding --debug if it is useful to let the clusters # for failed tests survive. ./roachtest.$sha run "cdc/bank" \ --port 8080 --count 10 --cpu-quota 500 \ --roachprod roachprod.${sha} --workload workload.${sha} \ --cockroach ./cockroach.$sha \ --artifacts artifacts.$sha | tee roachtest-stress.${sha} ```
/cc @cockroachdb/cdc
Internal log
``` no author provided ```
See this test on roachdash
Improve this report!