aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
432 stars 198 forks source link

Crash in client when doing queries during cluster maintainance #375

Closed adumovic closed 2 years ago

adumovic commented 2 years ago

This has happend multiple times now. Whenever our cluster is undergoing GCP maintenance, the 12 sesrvers (its 1 server per node) slowly go down one at a time over the course of an 30 minutes or so and come back on new nodes.

We have a service that is continually issuing queries, ever few seconds and almost without fail, during this maitanence, the aerospike client panics:


`panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xc845e0]

goroutine 33749 [running]:
github.com/aerospike/aerospike-client-go/v5.(*partitionTracker).partitionDone(...)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/partition_tracker.go:232
github.com/aerospike/aerospike-client-go/v5.(*baseMultiCommand).parseRecordResults(0xc001b44900, {0xc43557, 0xc001b449b8}, 0x16)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/multi_command.go:297 +0x160
github.com/aerospike/aerospike-client-go/v5.(*baseMultiCommand).parseResult(0xc001b44900, {0x133e858, 0xc001b44900}, 0x1b7ca60)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/multi_command.go:174 +0x307
github.com/aerospike/aerospike-client-go/v5.(*queryCommand).parseResult(0xc0013eff40, {0x133e858, 0xc001b44900}, 0x10000)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/query_command.go:45 +0x25
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc001b44900, {0x133e858, 0xc001b44900}, 0xc00137c510, 0xbf, {0xc011efa700, 0xc011efa6d0, 0x1b7ca60}, 0x1169661, 0x0)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/command.go:2166 +0xedc
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc86ffa, {0x133e858, 0xc001b44900}, 0x0)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/command.go:1991 +0x8a
github.com/aerospike/aerospike-client-go/v5.(*baseMultiCommand).execute(...)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/multi_command.go:399
github.com/aerospike/aerospike-client-go/v5.(*queryCommand).Execute(0xc001b44900)
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/query_command.go:50 +0x2f
github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute.func1()
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/werrgroup.go:62 +0xdb
created by github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute
    /go/src/myService/vendor/github.com/aerospike/aerospike-client-go/v5/werrgroup.go:55 +0x15a`

This service is a cleanup runner that runs a partition query ever 5 seconds in order to see if any rows match a filter and should be removed. it is calling "QueryPartitions" in the latest v5.8 client. Our aerospike cluster version is 5.7.0.8.

khaf commented 2 years ago

Fix released in v5.9.0.