Closed cockroach-teamcity closed 4 years ago
(roachtest).engine/switch/nodes=3 failed on master@752dea867f3aeb142e98c22f8d320ce19041aa8d:
The Cockroach Labs team appreciates your feedback.
E200307 08:13:16.183568 1 cli/error.go:373 ERROR: exit status 255
ERROR: exit status 255
Failed running "start"
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:293
github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1583
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357:
I200307 08:13:16.229079 1 cluster_synced.go:1665 command failed:
- exit status 1
cluster.go:2368,engine_switch.go:126,engine_switch.go:140,test_runner.go:741: error with attached stack trace:
main.(*monitor).WaitE
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
main.(*monitor).Wait
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2364
main.registerEngineSwitch.func1
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:126
main.registerEngineSwitch.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:140
main.(*testRunner).runTest.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- monitor failure:
- error with attached stack trace:
main.(*monitor).wait.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2412
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- monitor task failed:
- error with attached stack trace:
main.init
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2309
runtime.doInit
/usr/local/go/src/runtime/proc.go:5222
runtime.main
/usr/local/go/src/runtime/proc.go:190
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- Goexit() was called
cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1793930-1583568484-12-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
1: dead
3: 4177
2: 4210
Error: 1: dead
Artifacts: [/engine/switch/nodes=3](https://teamcity.cockroachdb.com/viewLog.html?buildId=1793930&tab=artifacts#/engine/switch/nodes=3) [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2Aengine%2Fswitch%2Fnodes%3D3.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)
@sumeerbhola Another replica consistency issue. I just skimmed the failure and it looks like all of the differences are in timeseries data. 752dea867f3aeb142e98c22f8d320ce19041aa8d does contain your fix though, so this is something different.
(roachtest).engine/switch/nodes=3 failed on master@c473f40078994551cebcbe00fdbf1fa388957658:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200309-1796240/engine/switch/nodes=3/run_1
test_runner.go:756: test timed out (10h0m0s)
cluster.go:1935,engine_switch.go:129,engine_switch.go:140,test_runner.go:741: error with attached stack trace:
main.(*cluster).RunE
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2010
main.(*cluster).Run
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
main.registerEngineSwitch.func1
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:129
main.registerEngineSwitch.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:140
main.(*testRunner).runTest.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- error with embedded safe details: output in %s
-- arg 1: <string>
- output in run_074057.459_n4_workload_check_tpcc:
- error with attached stack trace:
main.execCmd
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:406
main.(*cluster).RunL
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2019
main.(*cluster).RunE
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2000
main.(*cluster).Run
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
main.registerEngineSwitch.func1
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:129
main.registerEngineSwitch.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:140
main.(*testRunner).runTest.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- error with embedded safe details: %s returned:
stderr:
%s
stdout:
%s
-- arg 1: <string>
-- arg 2: <string>
-- arg 3: <string>
- /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1796240-1583738442-15-n4cpu4:4 -- ./workload check tpcc --warehouses=1 --expensive-checks=true {pgurl:1} returned:
stderr:
stdout::
- context canceled
Artifacts: [/engine/switch/nodes=3](https://teamcity.cockroachdb.com/viewLog.html?buildId=1796240&tab=artifacts#/engine/switch/nodes=3) [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2Aengine%2Fswitch%2Fnodes%3D3.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)
The unmarshaled MVCCMetadata
protos are equal. The difference is in the serialized bytes -- the Pebble one has a prefix of 120408001000180020002800
. This corresponds to tags 2, 3, 4, 5 in the proto: https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/enginepb/mvcc.proto#L37-L43
These are not being explicitly set by Pebble. I wrote a small test -- serializing an MVCCMetadata
in Go where these fields are not being explicitly set does produce bytes for these fields. The C++ serialization must not be producing bytes for these optional fields.
Huh. I thought Go proto encoding elided optional fields which are the zero value.
Huh. I thought Go proto encoding elided optional fields which are the zero value.
Or maybe that is only true of primitive fields. Hrmm...
Or maybe that is only true of primitive fields. Hrmm...
Not even true for primitive fields:
dAtA[i] = 0x18
i++
if m.Deleted {
dAtA[i] = 1
} else {
dAtA[i] = 0
}
This is all tickling something in the back of my head. I think there might be a difference here with the reflection-based marshaling. Gah! Ensuring we're exactly the same with C++ is quite the headache.
Ugh, this gets worse. We currently marshal MVCCMetadata
below Raft. Which means we can't change how that encoding is done on the Go side without causing other problems. See kv/kvserver/below_raft_protos_test.go
.
One hack would be to change MVCCValueMerger
to have special purpose code to marshal an MVCCMetadata proto that only contains the raw_bytes
and merge_timestamp
fields.
I suspect some sort of hack will definitely be necessary. What about marshaling of InternalTimeSeriesData
? I see that is also done in pebble_merger.go
.
This is also pointing out a hole in our unit testing. I believe we have unit tests for this merge operator, but we should also have a test that compares the byte output of the RocksDB and Pebble versions.
(roachtest).engine/switch/nodes=3 failed on master@72c4a1bd411f2f82bf9aaa22883821a946614148:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200310-1799071/engine/switch/nodes=3/run_1
cluster.go:2368,engine_switch.go:126,engine_switch.go:140,test_runner.go:747: error with attached stack trace:
main.(*monitor).WaitE
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
main.(*monitor).Wait
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2364
main.registerEngineSwitch.func1
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:126
main.registerEngineSwitch.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/engine_switch.go:140
main.(*testRunner).runTest.func2
/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:747
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
- monitor failure:
- unexpected node event: 3: dead
cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:801: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1799071-1583824469-05-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
2: 5059
1: 5185
3: dead
Error: 3: dead
Artifacts: [/engine/switch/nodes=3](https://teamcity.cockroachdb.com/viewLog.html?buildId=1799071&tab=artifacts#/engine/switch/nodes=3) [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2Aengine%2Fswitch%2Fnodes%3D3.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)
What about marshaling of InternalTimeSeriesData?
The only optional fields are start_timestamp_nanos
and sample_duration_nanos
, and it seems we always set them.
(roachtest).engine/switch/nodes=3 failed on master@954fe69d554162aec0fbc001aad1fe5103d8df13:
More
Artifacts: [/engine/switch/nodes=3](https://teamcity.cockroachdb.com/viewLog.html?buildId=1790595&tab=artifacts#/engine/switch/nodes=3) [See this test on roachdash](https://roachdash.crdb.dev/?filter=status%3Aopen+t%3A.%2Aengine%2Fswitch%2Fnodes%3D3.%2A&sort=title&restgroup=false&display=lastcommented+project) powered by [pkg/cmd/internal/issues](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/internal/issues)