cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.92k stars 3.78k forks source link

pkg/sql/colexecerror: BenchmarkSQLCatchVectorizedRuntimeError fails with error #124021

Open herkolategan opened 4 months ago

herkolategan commented 4 months ago

Notes:

This benchmark started failing recently (this weeks microbenchmark run). It only fails when conns=x exceeds a certain threshold hold, I suspect around ~1000.

This snippet below from the benchmark is how conns is determined.

numConns := runtime.GOMAXPROCS(0) * parallelism

Upping the cpu count --test-args='-test.cpu 24' simulates the environment the microbenchmarks run on, and increases the conns significantly that reproduces this issue. This could possibly just be a system resources issue, but needs further investigation to confirm, and decide if a maximum / cap should be put on this value.

To reproduce:

./dev bench pkg/sql/colexecerror --filter=BenchmarkSQLCatchVectorizedRuntimeError --count=1 --bench-time=1s --timeout=20m --bench-mem -v --stream-output --ignore-cache --test-args='-test.cpu 24'

Output:

goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkSQLCatchVectorizedRuntimeError
    test_log_scope.go:170: test logs captured to: /tmp/logBenchmarkSQLCatchVectorizedRuntimeError2704604800
    test_log_scope.go:81: use -show-logs to present logs inline
BenchmarkSQLCatchVectorizedRuntimeError/conns=32
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/noError
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/noError-32        45187         25943 ns/op       86515 B/op        900 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/expectedWithCode
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/expectedWithCode-32               32814         35894 ns/op      125966 B/op       1474 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/expectedAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/expectedAssertion-32              16371         72000 ns/op      370687 B/op       2333 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalAssertion-32              21884         54557 ns/op      289883 B/op       1731 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalIndexOutOfRange
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalIndexOutOfRange-32        19394         61970 ns/op      325261 B/op       2151 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalDivideByZero
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/internalDivideByZero-32           19824         59106 ns/op      319092 B/op       2147 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/contextCanceled
BenchmarkSQLCatchVectorizedRuntimeError/conns=32/contextCanceled-32                38986         30474 ns/op      117941 B/op       1327 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/noError
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/noError-32                       43171         24887 ns/op       85843 B/op        898 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/expectedWithCode
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/expectedWithCode-32              30998         35732 ns/op      125058 B/op       1470 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/expectedAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/expectedAssertion-32             17928         61946 ns/op      367860 B/op       2327 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalAssertion-32             22054         51779 ns/op      287470 B/op       1722 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalIndexOutOfRange
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalIndexOutOfRange-32       18720         60761 ns/op      323504 B/op       2149 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalDivideByZero
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/internalDivideByZero-32          21601         58093 ns/op      317666 B/op       2146 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/contextCanceled
BenchmarkSQLCatchVectorizedRuntimeError/conns=640/contextCanceled-32               47864         31899 ns/op      117652 B/op       1326 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/noError
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/noError-32                      57868         22097 ns/op       85734 B/op        899 allocs/op
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/expectedWithCode
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/expectedWithCode-32
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/expectedAssertion
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/expectedAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalAssertion
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalAssertion
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalIndexOutOfRange
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalIndexOutOfRange
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalDivideByZero
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/internalDivideByZero
BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/contextCanceled
    sql_runner.go:87: error executing query="SET distsql = off" args=[]: dial tcp 127.0.0.1:35813: connect: cannot assign requested address
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600/contextCanceled
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError/conns=1600
    error_test.go:249: -- test log scope end --
--- FAIL: BenchmarkSQLCatchVectorizedRuntimeError
FAIL

Jira issue: CRDB-38666

michae2 commented 4 months ago

I ran into this, too, when creating the benchmark (see this comment). Sadly changing net.ipv4.ip_local_port_range didn't seem to help. I'll add an upper limit to the number of connections.