cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.19k stars 3.82k forks source link

roachtest: sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed #124279

Closed cockroach-teamcity closed 6 months ago

cockroach-teamcity commented 6 months ago

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 992d4573fb8b71e28717b4d14c4c7ebccb38c412:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-38805

cockroach-teamcity commented 6 months ago

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 855b9cc97afa3df4f7e17f928c04ab0834b2630c:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 6 months ago

roachtest.sysbench/oltp_read_write/nodes=3/cpu=32/conc=256 failed with artifacts on master @ 93ad913106b6f0f6ec98bc2cfa788ff6d8085bd4:

(monitor.go:154).Wait: monitor failure: COMMAND_PROBLEM: exit status 132
test artifacts and logs in: /artifacts/sysbench/oltp_read_write/nodes=3/cpu=32/conc=256/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

renatolabs commented 6 months ago

Seems like sysbench is crashing with an Illegal instruction error while running the benchmark:

bash: line 21: 20650 Illegal instruction     (core dumped) bash -c "sysbench \\

There should be no differences between the version used in the Azure or GCE, but this seems to be failing only on the former. Given that we don't even collect any metrics in these runs (#123071), we could just skip this test on Azure.

srosenberg commented 6 months ago

There should be no differences between the version used in the Azure or GCE, but this seems to be failing only on the former. Given that we don't even collect any metrics in these runs (#123071), we could just skip this test on Azure.

I am guessing it's because of ROACHTEST_arch=arm64; i.e., the sysbench binary must have been built with amd64; double-checking...

srosenberg commented 6 months ago

the sysbench binary must have been built with amd64; double-checking...

Nope, it is picking up arm64 build of sysbench. I tried to reproduce this locally,

roachprod create --clouds azure -n1 --local-ssd=false --azure-machine-type Standard_D32pds_v5 stan-test --arch arm64
roachprod stage stan-test release v23.1.4 --arch arm64
roachprod start stan-test

sudo apt-get update;
sudo apt-get install -y sysbench

but it succeeded,

sysbench               --db-driver=pgsql               --pgsql-host=localhost          --pgsql-port=26257              --pgsql-user=roachprod          --pgsql-password=cockroachdb            --pgsql-db=sysbench           --report-interval=1             --time=600              --threads=256           --tables=10             --table_size=10000000           --auto_inc=false                oltp_read_write prepare
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Initializing worker threads...

Creating table 'sbtest3'...
Inserting 10000000 records into 'sbtest3'
Creating table 'sbtest2'...
Inserting 10000000 records into 'sbtest2'
Creating table 'sbtest9'...
Inserting 10000000 records into 'sbtest9'
Creating table 'sbtest8'...
Inserting 10000000 records into 'sbtest8'
Creating table 'sbtest5'...
Inserting 10000000 records into 'sbtest5'
Creating table 'sbtest6'...
Inserting 10000000 records into 'sbtest6'
Creating table 'sbtest4'...
Inserting 10000000 records into 'sbtest4'
Creating table 'sbtest10'...
Creating table 'sbtest1'...
Inserting 10000000 records into 'sbtest10'
Inserting 10000000 records into 'sbtest1'
Creating table 'sbtest7'...
Inserting 10000000 records into 'sbtest7'
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest9'...
Creating a secondary index on 'sbtest1'..

Evidently, sysbench is known to segfault occasionally [1], so there isn't much actionable here.

[1] https://github.com/cockroachdb/cockroach/blob/420d7da30c063edd781ab8eedbcafc78f1477847/pkg/cmd/roachtest/tests/sysbench.go#L141-L145