cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.8k forks source link

ccl/backupccl: TestScheduleBackupRecoversFromClusterDown failed #55928

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 4 years ago

(ccl/backupccl).TestScheduleBackupRecoversFromClusterDown failed on master@75234294d22cc9a6e12724b043f49e78f0ac92d2:

I201024 05:48:40.753047 468502 ccl/storageccl/export.go:97  [n1,s1,r17/1:/Table/2{1-2}] export [/Table/21/1,/Table/21/2)
I201024 05:48:40.753261 468506 ccl/storageccl/export.go:97  [n1,s1,r11/1:/Table/1{5-6}] export [/Table/15/2,/Table/15/3)
I201024 05:48:40.754336 468519 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/15/{3-4} (attempt 1, priority normal)
I201024 05:48:40.756897 468519 ccl/storageccl/export.go:97  [n1,s1,r11/1:/Table/1{5-6}] export [/Table/15/3,/Table/15/4)
I201024 05:48:40.758578 468520 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/21/{1-2} (attempt 1, priority normal)
I201024 05:48:40.760058 468521 ccl/storageccl/export.go:97  [n1,s1,r6/1:/Table/{SystemCon…-11}] export [/Table/6/1,/Table/6/2)
I201024 05:48:40.761650 468501 ccl/storageccl/export.go:97  [n1,s1,r11/1:/Table/1{5-6}] export [/Table/15/3,/Table/15/4)
I201024 05:48:40.763544 468518 ccl/storageccl/export.go:97  [n1,s1,r6/1:/Table/{SystemCon…-11}] export [/Table/5/1,/Table/5/2)
I201024 05:48:40.764650 468504 ccl/storageccl/export.go:97  [n1,s1,r10/1:/Table/1{4-5}] export [/Table/14/1,/Table/14/2)
I201024 05:48:40.766684 468517 ccl/storageccl/export.go:97  [n1,s1,r6/1:/Table/{SystemCon…-11}] export [/Table/4/1,/Table/4/2)
I201024 05:48:40.767911 468520 ccl/storageccl/export.go:97  [n1,s1,r17/1:/Table/2{1-2}] export [/Table/21/1,/Table/21/2)
I201024 05:48:40.768659 468522 ccl/storageccl/export.go:97  [n1,s1,r11/1:/Table/1{5-6}] export [/Table/15/2,/Table/15/3)
I201024 05:48:40.788021 468519 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/23/{1-2} (attempt 1, priority normal)
I201024 05:48:40.788070 468505 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/23/{1-2} (attempt 1, priority normal)
I201024 05:48:40.788271 468518 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/23/{3-4} (attempt 1, priority normal)
I201024 05:48:40.788329 468517 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/24/{1-2} (attempt 1, priority normal)
I201024 05:48:40.788400 468502 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/24/{1-2} (attempt 1, priority normal)
I201024 05:48:40.790682 468519 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/1,/Table/23/2)
I201024 05:48:40.788553 468522 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/37/{1-2} (attempt 1, priority normal)
I201024 05:48:40.791169 468505 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/1,/Table/23/2)
I201024 05:48:40.788412 468520 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/33/{1-2} (attempt 1, priority normal)
I201024 05:48:40.788142 468503 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/23/{2-3} (attempt 1, priority normal)
I201024 05:48:40.788356 468506 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/23/{3-4} (attempt 1, priority normal)
I201024 05:48:40.794065 468519 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/37/{2-3} (attempt 1, priority normal)
I201024 05:48:40.788149 468521 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/23/{2-3} (attempt 1, priority normal)
I201024 05:48:40.792753 468502 ccl/storageccl/export.go:97  [n1,s1,r20/1:/Table/2{4-5}] export [/Table/24/1,/Table/24/2)
I201024 05:48:40.788780 468501 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/37/{1-2} (attempt 1, priority normal)
I201024 05:48:40.793079 468517 ccl/storageccl/export.go:97  [n1,s1,r20/1:/Table/2{4-5}] export [/Table/24/1,/Table/24/2)
I201024 05:48:40.788723 468504 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/33/{1-2} (attempt 1, priority normal)
I201024 05:48:40.795365 468503 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/2,/Table/23/3)
I201024 05:48:40.796905 468521 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/2,/Table/23/3)
I201024 05:48:40.798278 468519 ccl/storageccl/export.go:97  [n1,s1,r33/1:/Table/3{7-8}] export [/Table/37/2,/Table/37/3)
I201024 05:48:40.798844 468517 ccl/backupccl/backup_processor.go:220  [n1,job=601122800101883905,backup-distsql] sending ExportRequest for span /Table/53/{1-2} (attempt 1, priority normal)
I201024 05:48:40.799910 468501 ccl/storageccl/export.go:97  [n1,s1,r33/1:/Table/3{7-8}] export [/Table/37/1,/Table/37/2)
I201024 05:48:40.800954 468518 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/3,/Table/23/4)
I201024 05:48:40.803411 468517 ccl/storageccl/export.go:97  [n1,s1,r36/1:/{Table/53-Max}] export [/Table/53/1,/Table/53/2)
I201024 05:48:40.805122 468504 ccl/storageccl/export.go:97  [n1,s1,r29/1:/Table/3{3-4}] export [/Table/33/1,/Table/33/2)
I201024 05:48:40.805537 468506 ccl/storageccl/export.go:97  [n1,s1,r19/1:/Table/2{3-4}] export [/Table/23/3,/Table/23/4)
I201024 05:48:40.807554 468505 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/37/{2-3} (attempt 1, priority normal)
I201024 05:48:40.807653 468520 ccl/storageccl/export.go:97  [n1,s1,r29/1:/Table/3{3-4}] export [/Table/33/1,/Table/33/2)
I201024 05:48:40.809052 468505 ccl/storageccl/export.go:97  [n1,s1,r33/1:/Table/3{7-8}] export [/Table/37/2,/Table/37/3)
I201024 05:48:40.807828 468502 ccl/backupccl/backup_processor.go:220  [n1,job=601122800675913729,backup-distsql] sending ExportRequest for span /Table/53/{1-2} (attempt 1, priority normal)
I201024 05:48:40.806047 468522 ccl/storageccl/export.go:97  [n1,s1,r33/1:/Table/3{7-8}] export [/Table/37/1,/Table/37/2)
I201024 05:48:40.811644 468502 ccl/storageccl/export.go:97  [n1,s1,r36/1:/{Table/53-Max}] export [/Table/53/1,/Table/53/2)
I201024 05:48:41.208295 468437 ccl/backupccl/schedule_exec.go:161  [n1,job=601122800675913729] backup job 601122800675913729 scheduled by 601122789652267009 succeeded
I201024 05:48:41.209251 468437 jobs/registry.go:1071  [n1] BACKUP job 601122800675913729: stepping through state succeeded with error: <nil>
I201024 05:48:41.320503 468288 ccl/backupccl/schedule_exec.go:161  [n1,job=601122800101883905] backup job 601122800101883905 scheduled by 601122789581750273 succeeded
I201024 05:48:41.320985 468288 jobs/registry.go:1071  [n1] BACKUP job 601122800101883905: stepping through state succeeded with error: <nil>
    --- FAIL: TestScheduleBackupRecoversFromClusterDown/retry (4.97s)
        testing.go:853: race detected during execution of test
More

Parameters: - TAGS= - GOFLAGS=-race -parallel=2 ``` make stressrace TESTS=TestScheduleBackupRecoversFromClusterDown PKG=./pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 ``` [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2ATestScheduleBackupRecoversFromClusterDown.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)

Jira issue: CRDB-3613

cockroach-teamcity commented 4 years ago

(ccl/backupccl).TestScheduleBackupRecoversFromClusterDown failed on master@8aceac3c99c3addece3a9ef9af04cc74715cdb37:

  | github.com/cockroachdb/cockroach/pkg/jobs.(*Job).succeeded
  |     /go/src/github.com/cockroachdb/cockroach/pkg/jobs/jobs.go:594
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
  |     /go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1139
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
  |     /go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1093
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob
  |     /go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:245
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1
  |     /go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:182
  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
  |     /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347
  | runtime.goexit
  |     /usr/local/go/src/runtime/asm_amd64.s:1357
Wraps: (4) log-job
Wraps: (5) context canceled
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
W201025 05:39:46.055576 474836 kv/txn.go:618  [n1] failure aborting transaction: node unavailable; try another peer; abort caused by: context canceled
E201025 05:39:46.056424 474836 jobs/adopt.go:256  [n1] job 601402731232919553: adoption completed with error job 601402731232919553: could not mark as reverting: could not mark job 601402731232919553 as succeeded: log-job: context canceled: context canceled
W201025 05:39:46.073884 475145 kv/txn.go:618  failure aborting transaction: node unavailable; try another peer; abort caused by: failed to fetch protectedts metadata: failed to read metadata: protectedts-GetMetadata: context canceled
I201025 05:39:46.089299 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (gc): node unavailable; try another peer
I201025 05:39:46.090071 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (merge): node unavailable; try another peer
I201025 05:39:46.090341 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (split): node unavailable; try another peer
I201025 05:39:46.090657 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I201025 05:39:46.091043 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (replicaGC): node unavailable; try another peer
W201025 05:39:46.091598 475300 kv/txn.go:618  [intExec=select-running/get-claimed-jobs] failure aborting transaction: node unavailable; try another peer; abort caused by: query execution canceled
I201025 05:39:46.092636 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (raftsnapshot): node unavailable; try another peer
I201025 05:39:46.093245 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
E201025 05:39:46.093358 472813 jobs/registry.go:654  error processing claimed jobs: could query for claimed jobs: select-running/get-claimed-jobs: context canceled
I201025 05:39:46.093647 472691 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
W201025 05:39:46.094383 472813 kv/txn.go:618  failure aborting transaction: node unavailable; try another peer; abort caused by: context canceled
E201025 05:39:46.094960 472813 jobs/registry.go:644  error claiming jobs: context canceled
E201025 05:39:46.097088 472813 jobs/registry.go:654  error processing claimed jobs: could query for claimed jobs: select-running/get-claimed-jobs: context canceled
    --- FAIL: TestScheduleBackupRecoversFromClusterDown/reschedule (438.56s)
        create_scheduled_backup_test.go:113: condition failed to evaluate within 45s: sql: no rows in result set
            goroutine 474603 [running]:
            runtime/debug.Stack(0xc06c85bbe0, 0x80cea00, 0xc000438060)
                /usr/local/go/src/runtime/debug/stack.go:24 +0xab
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x82e2d40, 0xc035544900, 0xc06c85bbe0)
                /go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:36 +0x87
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.(*testHelper).waitForSuccessfulScheduledJob(0xc092907440, 0xc035544900, 0x8589c97e8f90001)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:113 +0x12e
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestScheduleBackupRecoversFromClusterDown.func4(0xc035544900, 0x61de8b3, 0xa, 0x0, 0x0, 0x0)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:703 +0x575
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestScheduleBackupRecoversFromClusterDown.func9(0xc035544900)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:784 +0xba
            testing.tRunner(0xc035544900, 0xc037588d00)
                /usr/local/go/src/testing/testing.go:909 +0x19a
            created by testing.(*T).Run
                /usr/local/go/src/testing/testing.go:960 +0x652
More

Parameters: - TAGS= - GOFLAGS=-race -parallel=2 ``` make stressrace TESTS=TestScheduleBackupRecoversFromClusterDown PKG=./pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 ``` [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2ATestScheduleBackupRecoversFromClusterDown.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)

cockroach-teamcity commented 4 years ago

(ccl/backupccl).TestScheduleBackupRecoversFromClusterDown failed on master@157b6dc59552b26fb7ef7968ea3b79577c336423:

W201029 05:49:50.828950 478124 kv/kvserver/store_raft.go:492  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W201029 05:49:51.463233 477968 kv/kvserver/node_liveness.go:815  [n1,liveness-hb] slow heartbeat took 2.760854915s; err=<nil>
W201029 05:49:52.936013 478110 kv/kvserver/store_raft.go:492  [n1,s1,r4/1:/System{/tsd-tse}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W201029 05:49:54.478814 478115 kv/kvserver/store_raft.go:492  [n1,s1,r1/1:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W201029 05:49:54.583169 478089 kv/kvserver/store_raft.go:492  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W201029 05:49:54.926130 477968 kv/kvserver/node_liveness.go:815  [n1,liveness-hb] slow heartbeat took 1.617790754s; err=<nil>
I201029 05:49:55.811300 481196 ccl/backupccl/create_scheduled_backup_test.go:113  SucceedsSoon: sql: no rows in result set
I201029 05:49:55.882118 477966 server/status/runtime.go:522  [n1] runtime stats: 24 GiB RSS, 226 goroutines, 1.8 GiB/1.0 GiB/2.9 GiB GO alloc/idle/total, 29 MiB/73 MiB CGO alloc/total, 0.1 CGO/sec, 13.9/6.8 %(u/s)time, 0.0 %gc (0x), 1.8 KiB/1.8 KiB (r/w)net
I201029 05:49:56.315022 478000 util/stop/stopper.go:562  quiescing
I201029 05:49:56.500458 478179 kv/kvclient/kvcoord/transport_race.go:108  transport race promotion: ran 20 iterations on up to 8060 requests
E201029 05:49:56.506940 478009 jobs/registry.go:628  error expiring job sessions: expire-sessions: context canceled
W201029 05:49:56.475579 478008 sql/sqlliveness/slinstance/slinstance.go:181  [n1] exiting heartbeat loop
W201029 05:49:56.568919 478009 kv/txn.go:618  failure aborting transaction: node unavailable; try another peer; abort caused by: context canceled
E201029 05:49:56.625473 478009 jobs/registry.go:633  failed to serve pause and cancel requests: context canceled
W201029 05:49:56.625639 478009 jobs/registry.go:676  canceling all adopted jobs due to stopper quiescing
W201029 05:49:56.686637 478011 kv/txn.go:618  failure aborting transaction: node unavailable; try another peer; abort caused by: could not query jobs table: claim-jobs: context canceled
E201029 05:49:56.687382 478011 jobs/registry.go:643  error claiming jobs: could not query jobs table: claim-jobs: context canceled
W201029 05:49:56.705983 481553 kv/txn.go:618  [n1,job=602538332238446593] failure aborting transaction: node unavailable; try another peer; abort caused by: job-update: failed to send RPC: sending to all replicas failed; last error: node unavailable; try another peer
E201029 05:49:56.690347 477964 kv/kvserver/reports/reporter.go:141  [n1,replication-reporter] failed to generate replication reports: failed to save locality report: get-previous-timestamp: failed to send RPC: sending to all replicas failed; last error: node unavailable; try another peer
I201029 05:49:56.505039 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (gc): node unavailable; try another peer
I201029 05:49:56.773607 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (merge): node unavailable; try another peer
I201029 05:49:56.774094 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (split): node unavailable; try another peer
I201029 05:49:56.774320 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I201029 05:49:56.774531 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (replicaGC): node unavailable; try another peer
I201029 05:49:56.774740 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (raftlog): node unavailable; try another peer
I201029 05:49:56.775011 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (raftsnapshot): node unavailable; try another peer
I201029 05:49:56.775307 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
I201029 05:49:56.775559 478202 kv/kvserver/queue.go:582  [n1,s1] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
E201029 05:49:56.786350 481543 jobs/adopt.go:256  [n1] job 602538332238446593: adoption completed with error job 602538332238446593: node liveness error: restarting in background
W201029 05:49:56.832131 482347 sql/catalog/lease/lease.go:295  error releasing lease &{'5' '\x01' {{%!q(uint64=396727000) %!q(int64=63739547530) %!q(*time.Location=<nil>)}}}: lease-release: failed to send RPC: sending to all replicas failed; last error: node unavailable; try another peer
W201029 05:49:56.822858 482405 sql/catalog/lease/lease.go:295  error releasing lease &{'\x17' '\x01' {{%!q(uint64=73482000) %!q(int64=63739547550) %!q(*time.Location=<nil>)}}}: lease-release: failed to send RPC: sending to all replicas failed; last error: node unavailable; try another peer
W201029 05:49:56.832173 482349 sql/catalog/lease/lease.go:295  error releasing lease &{'4' '\x01' {{%!q(uint64=862217000) %!q(int64=63739547539) %!q(*time.Location=<nil>)}}}: lease-release: failed to send RPC: sending to all replicas failed; last error: node unavailable; try another peer
E201029 05:49:56.927694 478011 jobs/registry.go:653  error processing claimed jobs: could query for claimed jobs: select-running/get-claimed-jobs: context canceled
    --- FAIL: TestScheduleBackupRecoversFromClusterDown/reschedule (100.63s)
        create_scheduled_backup_test.go:113: condition failed to evaluate within 45s: sql: no rows in result set
            goroutine 481196 [running]:
            runtime/debug.Stack(0xc06f8ebbe0, 0x80f16e0, 0xc0000fe940)
                /usr/local/go/src/runtime/debug/stack.go:24 +0xab
            github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x83048c0, 0xc0625d3900, 0xc06f8ebbe0)
                /go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:36 +0x87
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.(*testHelper).waitForSuccessfulScheduledJob(0xc0be421400, 0xc0625d3900, 0x85ca56729218001)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:113 +0x12e
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestScheduleBackupRecoversFromClusterDown.func4(0xc0625d3900, 0x61e773b, 0xa, 0x0, 0x0, 0x0)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:703 +0x4b2
            github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestScheduleBackupRecoversFromClusterDown.func9(0xc0625d3900)
                /go/src/github.com/cockroachdb/cockroach/pkg/ccl/backupccl/create_scheduled_backup_test.go:789 +0xba
            testing.tRunner(0xc0625d3900, 0xc078d733c0)
                /usr/local/go/src/testing/testing.go:909 +0x19a
            created by testing.(*T).Run
                /usr/local/go/src/testing/testing.go:960 +0x652
More

Parameters: - TAGS= - GOFLAGS=-race -parallel=2 ``` make stressrace TESTS=TestScheduleBackupRecoversFromClusterDown PKG=./pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 ``` [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2ATestScheduleBackupRecoversFromClusterDown.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)

miretskiy commented 4 years ago

Working on it.

cockroach-teamcity commented 4 years ago

(ccl/backupccl).TestScheduleBackupRecoversFromClusterDown failed on master@9ec0151e0079c51afeec2f3f75a4747ce80a615b:

Fatal error:

F201102 06:49:57.244259 512468 sqlmigrations/migrations.go:667  [n1] not enough time left on migration lease, terminating for safety

Stack:

goroutine 512468 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xb953001, 0x0, 0x0, 0x61ec2cc)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xaf96be0, 0x4, 0x16439e2a67f9ac4e, 0x7d1d4, 0xa9c9627, 0x1b, 0x29b, 0xc0ceb52440, 0x3f, 0xc0d35c2bb8, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:243 +0xb31
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x81a7000, 0xc05a331ad0, 0x4, 0x2, 0x62d8a20, 0x3f, 0x0, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:52 +0x19a
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x81a7000, 0xc05a331ad0, 0x1, 0xc000000004, 0x62d8a20, 0x3f, 0x0, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:43 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:172
github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations.func2(0x81a7000, 0xc05a331ad0)
    /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:667 +0x347
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc07c849400, 0x81a7000, 0xc05a331ad0, 0xc05ac1e9f0, 0x29, 0x0, 0xc040b8f120)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0x156
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:341 +0x135
Log preceding fatal error

``` An inability to maintain liveness will prevent a node from participating in a cluster. If this problem persists, it may be a sign of resource starvation or of network connectivity problems. For help troubleshooting, visit: https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues E201102 06:49:55.169812 512538 kv/kvserver/queue.go:1087 [n1,consistencyChecker,s1,r16/1:/Table/2{0-1}] operation "consistencyChecker queue process replica 16" timed out after 1m0s I201102 06:49:55.189940 512285 kv/kvserver/store.go:2618 [n1,s1] __level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp WAL 2 306 K - 304 K - - - - 306 K - - - 1.0 0 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 2 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 3 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 4 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 5 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 6 0 0 B - 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 total 0 0 B - 306 K 0 B 0 0 B 0 306 K 0 0 B 0 1.0 flush 0 compact 0 0 B 0 B (size == estimated-debt, in = in-progress-bytes) memtbl 2 768 K zmemtbl 0 0 B ztbl 0 0 B bcache 0 0 B 0.0% (score == hit-rate) tcache 0 0 B 0.0% (score == hit-rate) titers 0 filter - - 0.0% (score == utility) I201102 06:49:55.237873 512329 kv/kvserver/replica_range_lease.go:1083 [n1,s1,r2/1:/System/NodeLiveness{-Max}] slow lease acquisition finished after 11m45.053605946s with error after 1 attempts I201102 06:49:56.692591 512142 gossip/gossip.go:1507 [n1] node has connected to cluster via gossip W201102 06:49:56.916176 512792 kv/kvserver/node_liveness.go:815 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow heartbeat took 12m23.223090007s; err=heartbeat failed on epoch increment E201102 06:49:56.917161 512792 kv/kvserver/replica_range_lease.go:338 [n1,s1,r6/1:/Table/{SystemCon…-11}] heartbeat failed on epoch increment I201102 06:49:56.918526 512662 kv/kvserver/replica_range_lease.go:1083 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 12m25.421315893s with error [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1604299004.016733648,0 after 1 attempts W201102 06:49:56.957299 512857 kv/kvserver/node_liveness.go:815 [n1,s1,r5/1:/{Systemtse-Table/System…}] slow heartbeat took 2.130671398s; err=heartbeat failed on epoch increment W201102 06:49:56.955426 512753 kv/kvserver/node_liveness.go:815 [n1,s1,r3/1:/System/{NodeLive…-tsd}] slow heartbeat took 12m15.591756107s; err=heartbeat failed on epoch increment E201102 06:49:56.973284 512857 kv/kvserver/replica_range_lease.go:338 [n1,s1,r5/1:/{Systemtse-Table/System…}] heartbeat failed on epoch increment W201102 06:49:56.956281 512824 kv/kvserver/node_liveness.go:815 [n1,s1,r26/1:/NamespaceTable/{30-Max}] slow heartbeat took 2.167180653s; err=heartbeat failed on epoch increment E201102 06:49:56.974866 512824 kv/kvserver/replica_range_lease.go:338 [n1,s1,r26/1:/NamespaceTable/{30-Max}] heartbeat failed on epoch increment W201102 06:49:56.959402 512963 kv/kvserver/node_liveness.go:815 [n1,s1,r4/1:/System{/tsd-tse}] slow heartbeat took 1.858095839s; err=heartbeat failed on epoch increment E201102 06:49:56.978863 512963 kv/kvserver/replica_range_lease.go:338 [n1,s1,r4/1:/System{/tsd-tse}] heartbeat failed on epoch increment E201102 06:49:56.973498 512753 kv/kvserver/replica_range_lease.go:338 [n1,s1,r3/1:/System/{NodeLive…-tsd}] heartbeat failed on epoch increment W201102 06:49:56.982569 512356 kv/kvserver/node_liveness.go:815 [n1,liveness-hb] slow heartbeat took 1.816919022s; err=heartbeat failed on epoch increment I201102 06:49:56.983237 512356 kv/kvserver/node_liveness.go:707 [n1,liveness-hb] heartbeat failed on epoch increment; retrying W201102 06:49:56.984395 513048 kv/kvserver/node_liveness.go:815 [n1,s1,r17/1:/Table/2{1-2}] slow heartbeat took 1.756853979s; err=heartbeat failed on epoch increment E201102 06:49:56.985199 513048 kv/kvserver/replica_range_lease.go:338 [n1,s1,r17/1:/Table/2{1-2}] heartbeat failed on epoch increment I201102 06:49:56.983999 512468 kv/kvserver/replica_range_lease.go:1083 [n1,s1,r3/1:/System/{NodeLive…-tsd}] slow lease acquisition finished after 12m16.036367009s with error [NotLeaseHolderError] r3: replica (n1,s1):1 not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1604299004.058324405,0 after 1 attempts I201102 06:49:56.918618 512328 kv/kvserver/replica_range_lease.go:1083 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 12m1.208307234s with error [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1604299004.016733648,0 after 1 attempts I201102 06:49:56.918694 512742 kv/kvserver/replica_range_lease.go:1083 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 11m24.941956908s with error [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; current lease is repl=(n1,s1):1 seq=1 start=0,0 epo=1 pro=1604299004.016733648,0 after 1 attempts W201102 06:49:57.183666 512662 kv/kvclient/kvcoord/dist_sender.go:1502 [n1,intExec=retrieve-prev-setting,txn=f88251f6] slow range RPC: have been waiting 773.76s (1 attempts) for RPC Get [/Table/2/1/0,/Min), [txn: f88251f6] to r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.GetResponse W201102 06:49:57.192657 512742 kv/kvclient/kvcoord/dist_sender.go:1502 [n1,intExec=update-session,txn=344e9a6f] slow range RPC: have been waiting 747.39s (1 attempts) for RPC Get [/Table/3/1/39/2/1,/Min), [txn: 344e9a6f], [can-forward-ts] to r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.GetResponse W201102 06:49:57.224336 512468 kv/kvclient/kvcoord/dist_sender.go:1502 [n1] slow range RPC: have been waiting 761.08s (1 attempts) for RPC ConditionalPut [/System/"system-version/lease",/Min) to r3:/System/{NodeLivenessMax-tsd} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.ConditionalPutResponse ```

More

Parameters: - TAGS= - GOFLAGS=-race -parallel=2 ``` make stressrace TESTS=TestScheduleBackupRecoversFromClusterDown PKG=./pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 ``` [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2ATestScheduleBackupRecoversFromClusterDown.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)

cockroach-teamcity commented 3 years ago

(ccl/backupccl).TestScheduleBackupRecoversFromClusterDown failed on master@df21e5065a79d1dd5b0fd76018724dd26660beb2:

Fatal error:

F201111 06:39:46.312768 510181 sqlmigrations/migrations.go:667  [n1] not enough time left on migration lease, terminating for safety

Stack:

goroutine 510181 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xbc74f01, 0x0, 0x0, 0x6404598)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xb2b8a20, 0x4, 0x164660d54565ec16, 0x7c8e5, 0xacea874, 0x1b, 0x29b, 0xc076875740, 0x3f, 0xc08b2f3e50, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:243 +0xb31
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x8424000, 0xc06726d3b0, 0x4, 0x2, 0x64f189a, 0x3f, 0x0, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:52 +0x19a
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x8424000, 0xc06726d3b0, 0x1, 0xc000000004, 0x64f189a, 0x3f, 0x0, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:43 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:172
github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations.func2(0x8424000, 0xc06726d3b0)
    /go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:667 +0x347
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc079d1d220, 0x8424000, 0xc06726d3b0, 0xc08c0e6ba0, 0x29, 0x0, 0xc080900bc0)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0x156
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:341 +0x135
Log preceding fatal error

``` | runtime.doInit | /usr/local/go/src/runtime/proc.go:5223 | runtime.doInit | /usr/local/go/src/runtime/proc.go:5223 | runtime.doInit | /usr/local/go/src/runtime/proc.go:5223 | runtime.main | /usr/local/go/src/runtime/proc.go:190 | runtime.goexit | /usr/local/go/src/runtime/asm_amd64.s:1357 Wraps: (2) not live Error types: (1) *withstack.withStack (2) *errutil.leafError I201111 06:39:45.283786 509912 server/status/runtime.go:522 [n1] runtime stats: 25 GiB RSS, 233 goroutines, 2.0 GiB/1.0 GiB/3.6 GiB GO alloc/idle/total(stale), 22 MiB/62 MiB CGO alloc/total, 0.6 CGO/sec, 3606.7/605.7 %(u/s)time, 8.7 %gc (1x), 208 B/208 B (r/w)net I201111 06:39:44.932218 510613 kv/kvserver/replica_write.go:182 [n1,s1,r2/1:/System/NodeLiveness{-Max}] slow command RequestLease [/System/NodeLiveness,/Min) finished after 205.66s with error I201111 06:39:45.592283 510060 kv/kvserver/replica_range_lease.go:1084 [n1,s1,r2/1:/System/NodeLiveness{-Max}] slow lease acquisition finished after 1m20.988521674s with error after 1 attempts I201111 06:39:46.098829 509907 kv/kvserver/store.go:2619 [n1,s1] __level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp WAL 2 317 K - 315 K - - - - 317 K - - - 1.0 0 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 2 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 3 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 4 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 5 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 6 0 0 B - 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0 total 0 0 B - 317 K 0 B 0 0 B 0 317 K 0 0 B 0 1.0 flush 0 compact 0 0 B 0 B (size == estimated-debt, in = in-progress-bytes) memtbl 2 768 K zmemtbl 0 0 B ztbl 0 0 B bcache 0 0 B 0.0% (score == hit-rate) tcache 0 0 B 0.0% (score == hit-rate) titers 0 filter - - 0.0% (score == utility) W201111 06:39:46.186982 510709 kv/kvserver/liveness/liveness.go:813 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow heartbeat took 3m1.259642607s; err= W201111 06:39:46.187840 510818 kv/kvserver/liveness/liveness.go:813 [n1,s1,r3/1:/System/{NodeLive…-tsd}] slow heartbeat took 4.210860823s; err=heartbeat failed on epoch increment E201111 06:39:46.188724 510818 kv/kvserver/replica_range_lease.go:339 [n1,s1,r3/1:/System/{NodeLive…-tsd}] heartbeat failed on epoch increment W201111 06:39:46.188792 509914 kv/kvserver/liveness/liveness.go:813 [n1,liveness-hb] slow heartbeat took 1.390217263s; err=heartbeat failed on epoch increment I201111 06:39:46.189493 509914 kv/kvserver/liveness/liveness.go:705 [n1,liveness-hb] heartbeat failed on epoch increment; retrying E201111 06:39:46.189819 510824 kv/kvserver/replica_range_lease.go:339 [n1,s1,r19/1:/Table/2{3-4}] heartbeat failed on epoch increment E201111 06:39:46.191654 510779 kv/kvserver/replica_range_lease.go:339 [n1,s1,r26/1:/NamespaceTable/{30-Max}] heartbeat failed on epoch increment E201111 06:39:46.192584 510800 kv/kvserver/replica_range_lease.go:339 [n1,s1,r4/1:/System{/tsd-tse}] heartbeat failed on epoch increment E201111 06:39:46.194832 510882 kv/kvserver/replica_range_lease.go:339 [n1,s1,r32/1:/Table/3{6-7}] heartbeat failed on epoch increment I201111 06:39:46.192154 510059 kv/kvserver/replica_range_lease.go:1084 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 1m19.243272464s with error after 1 attempts I201111 06:39:46.192317 510760 kv/kvserver/replica_range_lease.go:1084 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 1m52.863978746s with error after 1 attempts I201111 06:39:46.192382 510583 kv/kvserver/replica_range_lease.go:1084 [n1,s1,r6/1:/Table/{SystemCon…-11}] slow lease acquisition finished after 3m1.742448924s with error after 1 attempts W201111 06:39:46.289886 510583 kv/kvclient/kvcoord/dist_sender.go:1502 [intExec=get-tables,n1,txn=66c5d797] slow range RPC: have been waiting 206.64s (1 attempts) for RPC Scan [/Table/3/1,/Table/3/2), [txn: 66c5d797] to r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.ScanResponse W201111 06:39:46.263840 510584 kv/kvclient/kvcoord/dist_sender.go:1502 [n1,intExec=update-session,txn=73a80cfb] slow range RPC: have been waiting 135.20s (1 attempts) for RPC Get [/Table/3/1/39/2/1,/Min), [txn: 73a80cfb], [can-forward-ts] to r6:/Table/{SystemConfigSpan/Start-11} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.GetResponse W201111 06:39:46.304006 510181 kv/kvclient/kvcoord/dist_sender.go:1502 [n1] slow range RPC: have been waiting 156.78s (1 attempts) for RPC ConditionalPut [/System/"system-version/lease",/Min) to r3:/System/{NodeLivenessMax-tsd} [(n1,s1):1, next=2, gen=0]; resp: (err: ), *roachpb.ConditionalPutResponse ```

More

Parameters: - TAGS= - GOFLAGS=-race -parallel=2 ``` make stressrace TESTS=TestScheduleBackupRecoversFromClusterDown PKG=./pkg/ccl/backupccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 ``` [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2ATestScheduleBackupRecoversFromClusterDown.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)

cockroach-teamcity commented 2 years ago

ccl/backupccl.TestScheduleBackupRecoversFromClusterDown failed with artifacts on master @ 2f051c95ebb991e18134d2454807f992b05e4929:

=== RUN   TestScheduleBackupRecoversFromClusterDown/retry
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM) Parameters in this failure: - TAGS=bazel,gss

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

ccl/backupccl.TestScheduleBackupRecoversFromClusterDown failed with artifacts on master @ 29d21029fa8e3936c1981ecaacd2b882a4bbb1cc:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x39f327c]

goroutine 38375 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x4003aee6a8?, {0x6867890, 0x400679ad80})
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:229 +0x68
panic({0x4a64b80, 0x96967d0})
    GOROOT/src/runtime/panic.go:884 +0x20c
github.com/cockroachdb/cockroach/pkg/jobs.(*ScheduledJob).ExecutorType(...)
    github.com/cockroachdb/cockroach/pkg/jobs/scheduled_job.go:173
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.processSchedulePTSRecord({0x6867890?, 0x4003193fb0?}, 0x4003aee968?, {{0x400c32e600, 0x10, 0x10}, {0x174c7a3ad4053bc8, 0x0, 0x0}, 0x0, ...}, ...)
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:227 +0x9c
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps.func1({0x6867890, 0x4003193fb0}, {0x68be520, 0x4004a6f860})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:124 +0x268
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn.func1({0x6867890?, 0x4003193fb0?}, 0x68ce400?)
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1406 +0x38
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4({0x6867890, 0x4003193fb0}, 0x4004bb7ef0)
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1484 +0x2ac
github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1({0x6867890?, 0x4003193fb0?}, 0x4003aeedb8?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:965 +0x2c
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec(0x4004bb7ef0, {0x6867890, 0x4003193fb0}, 0x4003aef0e8)
    github.com/cockroachdb/cockroach/pkg/kv/txn.go:928 +0x70
github.com/cockroachdb/cockroach/pkg/kv.runTxn({0x6867890, 0x4003193fb0}, 0x4003193fb0?, 0x45f4ec?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:964 +0x58
github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl(0xffffb6643d28?, {0x6867890, 0x4003193fb0}, 0xb9af640?, 0x40?, 0x0?, 0x0?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:927 +0xa8
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0x38?, {0x6867890?, 0x4003193fb0?}, 0x589400?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:902 +0x34
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn(0x40017e9dd0, {0x6867890, 0x4003193fb0}, 0x400950e400, {0x0, 0x0, 0x10000000009d01?})
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1472 +0x20c
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn(0x4a5ea60?, {0x6867890, 0x4003193fb0}, 0x4000ddbc20, {0x0, 0x0, 0x0})
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1407 +0x94
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps({0x6867890, 0x4003193fb0}, {0x68b9990, 0x4008a3c218})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:105 +0xec
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume.func1({0x52e00e2?, 0x4003aef448?}, 0x5689c48)
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:64 +0xb8
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume(0x4004a0e1a0?, {0x6867890, 0x4003193fb0}, {0x4ffa180?, 0x4008a3c218})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:75 +0x238
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2(0x4004a0e1a0?, 0x0?, {0x683ac38?, 0x4008a3c200?}, {0x6867890?, 0x400b0daa80?}, {0x4ffa180?, 0x4008a3c218?}, 0x4003aef860)
    github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1529 +0xf4
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine(0x4007974d80, {0x6867890, 0x400679afc0}, {0x4ffa180, 0x4008a3c218}, {0x683ac38, 0x4008a3c200}, 0x4004a0e1a0, {0x400bf37528, 0x7}, ...)
    github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1530 +0x6a0
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob(0x4007974d80, {0x6867890, 0x400679ad50}, {0x683ac38, 0x4008a3c200}, 0x4004a0e1a0, {0x400bf37528, 0x7}, {0x400bf37700, 0x7})
    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:417 +0x500
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1({0x4004b06000?, 0x4c09f4?})
    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:335 +0xc0
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x128
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x358
=== RUN   TestScheduleBackupRecoversFromClusterDown/retry
Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

ccl/backupccl.TestScheduleBackupRecoversFromClusterDown failed with artifacts on master @ 9b43dc50d468171bc9919dc684b7274b63e13da2:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x40ca27b]

Stack:

goroutine 944694 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x1e8ee8a?, {0x70ace30, 0xc02272d6b0})
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:229 +0x6a
panic({0x52ab880, 0x9f9f7d0})
    GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/jobs.(*ScheduledJob).ExecutorType(...)
    github.com/cockroachdb/cockroach/pkg/jobs/scheduled_job.go:173
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.processSchedulePTSRecord({0x70ace30?, 0xc01089a570?}, 0xc002bf29b8?, {{0xc021899970, 0x10, 0x10}, {0x174c8100c9c8e9f8, 0x0, 0x0}, 0x0, ...}, ...)
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:227 +0x9b
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps.func1({0x70ace30, 0xc01089a570}, {0x7103820, 0xc0127d3ea0})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:124 +0x345
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn.func1({0x70ace30?, 0xc01089a570?}, 0x7113798?)
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1406 +0x31
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4({0x70ace30, 0xc01089a570}, 0xc020f9b550)
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1484 +0x3b4
github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1({0x70ace30?, 0xc01089a570?}, 0xc010a06ea0?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:965 +0x27
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec(0xc020f9b550, {0x70ace30, 0xc01089a570}, 0xc002bf3110)
    github.com/cockroachdb/cockroach/pkg/kv/txn.go:928 +0xae
github.com/cockroachdb/cockroach/pkg/kv.runTxn({0x70ace30, 0xc01089a570}, 0xc01089a570?, 0x3b9aca00?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:964 +0x6b
github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl(0x7fd352653108?, {0x70ace30, 0xc01089a570}, 0x20c514c0?, 0xc0?, 0x0?, 0x0?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:927 +0xa7
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0x463207?, {0x70ace30?, 0xc01089a570?}, 0x1?)
    github.com/cockroachdb/cockroach/pkg/kv/db.go:902 +0x2d
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn(0xc006aa1830, {0x70ace30, 0xc01089a570}, 0xc000da07e0, {0x0, 0x0, 0x56921e0?})
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1472 +0x2a7
github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn(0x52a5760?, {0x70ace30, 0xc01089a570}, 0xc009b7fa80, {0x0, 0x0, 0x0})
    github.com/cockroachdb/cockroach/pkg/sql/internal.go:1407 +0x9f
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps({0x70ace30, 0xc01089a570}, {0x70fec10, 0xc0004b46a0})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:105 +0xf0
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume.func1({0x5b26d67?, 0xc001c9b460?}, 0x5ed0698)
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:64 +0xb1
github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume(0xc01439f1e0?, {0x70ace30, 0xc01089a570}, {0x5840f80?, 0xc0004b46a0})
    github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:75 +0x31a
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2(0xc01439f1e0?, 0x0?, {0x7080468?, 0xc0004b4648?}, {0x70ace30?, 0xc012c16900?}, {0x5840f80?, 0xc0004b46a0?}, 0xc002bf3870)
    github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1529 +0xcc
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine(0xc0055f6900, {0x70ace30, 0xc02272d920}, {0x5840f80, 0xc0004b46a0}, {0x7080468, 0xc0004b4648}, 0xc01439f1e0, {0xc017b31738, 0x7}, ...)
    github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1530 +0x8ed
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob(0xc0055f6900, {0x70ace30, 0xc02272d680}, {0x7080468, 0xc0004b4648}, 0xc01439f1e0, {0xc017b31738, 0x7}, {0xc017b318d0, 0x7})
    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:417 +0x5d7
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1({0xc0048f9800?, 0xc02272cd80?})
    github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:335 +0xfa
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x146
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
    github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x43b
Log preceding fatal error

``` * | GOROOT/src/runtime/panic.go:884 * | runtime.panicmem * | GOROOT/src/runtime/panic.go:260 * | runtime.sigpanic * | GOROOT/src/runtime/signal_unix.go:835 * | github.com/cockroachdb/cockroach/pkg/jobs.(*ScheduledJob).ExecutorType * | github.com/cockroachdb/cockroach/pkg/jobs/scheduled_job.go:173 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.processSchedulePTSRecord * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:227 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps.func1 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:124 * | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn.func1 * | github.com/cockroachdb/cockroach/pkg/sql/internal.go:1406 * | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4 * | github.com/cockroachdb/cockroach/pkg/sql/internal.go:1484 * | github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1 * | github.com/cockroachdb/cockroach/pkg/kv/db.go:965 * | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec * | github.com/cockroachdb/cockroach/pkg/kv/txn.go:928 * | github.com/cockroachdb/cockroach/pkg/kv.runTxn * | github.com/cockroachdb/cockroach/pkg/kv/db.go:964 * | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl * | github.com/cockroachdb/cockroach/pkg/kv/db.go:927 * | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn * | github.com/cockroachdb/cockroach/pkg/kv/db.go:902 * | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn * | github.com/cockroachdb/cockroach/pkg/sql/internal.go:1472 * | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).Txn * | github.com/cockroachdb/cockroach/pkg/sql/internal.go:1407 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.manageProtectedTimestamps * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/job_statistics.go:105 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume.func1 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:64 * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller.(*metricsPoller).Resume * | github.com/cockroachdb/cockroach/pkg/jobs/metricspoller/poller.go:75 * | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2 * | github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1529 * | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine * | github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1530 * | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob * | github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:417 * | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1 * | github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:335 * | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 * | github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 * | runtime.goexit * | GOROOT/src/runtime/asm_amd64.s:1594 * Wraps: (2) runtime error: invalid memory address or nil pointer dereference * Error types: (1) *withstack.withStack (2) runtime.errorString * ```

Parameters: TAGS=bazel,gss

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

This test on roachdash | Improve this report!

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/disaster-recovery

miretskiy commented 1 year ago

Fixed by https://github.com/cockroachdb/cockroach/pull/99153