aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
429 stars 199 forks source link

Client occasionally crashes with index out of bounds when getting a node from a partition #367

Open asecondo opened 2 years ago

asecondo commented 2 years ago

Occasionally when writing to Aerospike, a service crashes in the Aerospike client when attempting to get a node from the current partition. The length of the array is always 0 when the crash occurs.

The code here could be modified to ensure it's not getting exceeding the bounds of replicas, but I'm trying to understand why this is happening. Any thoughts?

Client version: 5.7.0

Example stack trace (excluding service specific code):

github.com/aerospike/aerospike-client-go/v5.(*Partition).getSequenceNode(0x46d5d3, 0x61f9685e)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/partition.go:195 +0xfc
github.com/aerospike/aerospike-client-go/v5.(*Partition).GetNodeWrite(0xc01b79eef8, 0x48c6d7)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/partition.go:165 +0x53
github.com/aerospike/aerospike-client-go/v5.(*operateCommand).getNode(0xc07677f782a295c3, {0x4e94324576a4, 0x1d0a1c0})
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/operate_command.go:43 +0x2d
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc03018ae10, {0x13db778, 0xc03018ae10}, 0xc001fabb90, 0x80, {0xc001fabb90, 0x0, 0x1d0a1c0}, 0x0, 0x0)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/command.go:2058 +0x532
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc001fabb90, {0x13db778, 0xc03018ae10}, 0x4)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/command.go:1991 +0x8a
github.com/aerospike/aerospike-client-go/v5.(*operateCommand).Execute(...)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/operate_command.go:60
github.com/aerospike/aerospike-client-go/v5.(*Client).Operate(0xc0001fc840, 0x0, 0x0, {0xc01cf073e0, 0xfc4500, 0xc01cf13eb0})
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.7.0/client.go:515 +0x3be
khaf commented 2 years ago

This is very surprising. Are you monitoring your cluster? Does a partition or major network issue happen at the same time as these errors?

asecondo commented 2 years ago

@khaf We are monitoring our clusters, but I wasn't able to find any direct indicators that a partition or major network issue happened during the time this error popped up. Do you have any suggestions for anything to specifically look for?

For a bit more context, this has only happened ~75 times in the past 30 days, and 1 time in the past 2 weeks. It's likely that this is related to some partition or network issue as you described since there's no clear pattern that I've been able to identify yet.