aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
429 stars 199 forks source link

Panic inside chainErrors in v5 #351

Closed devlo closed 3 years ago

devlo commented 3 years ago

Hello, After upgrading to v5 we are experiencing panic in BatchGet command, it works on previous version without issues. Panic occurs here https://github.com/aerospike/aerospike-client-go/blob/v5/error.go#L386 It seems like ae is not set on type switch, which could indicate that outer is nil or have different type.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7ad61d]

goroutine 268 [running]:
github.com/aerospike/aerospike-client-go/v5.chainErrors(0x0, 0x0, 0x1b4bbc0, 0xc000628c60, 0xc000628c60, 0x0)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/error.go:386 +0xbd
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc00053d180, 0x1b8f760, 0xc00053d180, 0xc0005e0e80, 0xc00053d101, 0x0, 0x0, 0x0, 0x1, 0x0, ...)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/command.go:2011 +0x16bb
github.com/aerospike/aerospike-client-go/v5.(*batchCommand).retryBatch(0xc000534c80, 0x1b93500, 0xc000534c80, 0xc000541200, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00000af00, ...)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/batch_command.go:75 +0x22f
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc000534c80, 0x1b8f760, 0xc000534c80, 0xc0005e0e80, 0xc0005e0e01, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/command.go:2009 +0x163a
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc000534c80, 0x1b8f760, 0xc000534c80, 0xc00093df01, 0x9ef4b0, 0xc000644280)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/command.go:1970 +0xda
github.com/aerospike/aerospike-client-go/v5.(*baseMultiCommand).execute(...)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/multi_command.go:366
github.com/aerospike/aerospike-client-go/v5.(*batchCommandGet).Execute(0xc000534c80, 0x0, 0x100010000)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/batch_command_get.go:224 +0x45
github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute.func1(0xc000255cc0, 0x1b8f760, 0xc000534c80)
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/werrgroup.go:62 +0xc6
created by github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute
    /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.0/werrgroup.go:55 +0xaf
khaf commented 3 years ago

Which server version and edition are you using?

devlo commented 3 years ago

Aerospike Community Edition build 5.6.0.3

9mm commented 3 years ago

I'm also having the same issue, same aerospike version

khaf commented 3 years ago

Thanks for your report. The fix is coming tomorrow.

khaf commented 3 years ago

This should be fixed in the v5.0.1. Feel free to close the ticket if the issue is resolved. Otherwise, give us a holler.

devlo commented 3 years ago

This fix seems to introduce another regression, we are getting now randomly:

ResultCode: COMMON_ERROR, Iteration: 0, InDoubt: false, Node: \u003cnil\u003e: common, none-aerospike error. Checked the wrapped error for detail\nResultCode: NO_AVAILABLE_CONNECTIONS_TO_NODE, Iteration: 0, InDoubt: false, Node: \u003cnil\u003e: connection pool is empty. This happens when either all connection are in-use already, or no connections were available
khaf commented 3 years ago

Could you please give me a bit more context about this error and where and how it happens? All tests are passing, this has to be a corner case that I'm missing.

devlo commented 3 years ago

3 node cluster, BatchGet request with default policy.

I'm not sure if this patch introduced this regression or it was like that in v5.0.0 as this is the first time we could use BatchGet in v5. It's triggered randomly, so it could work 5 times in a row and it will error on 6th etc. that's why maybe tests are passing. We are using v5.0.0 on different part of our system already and there are no problems there BUT we are not using BatchGet there at all. So it seems it's something wrong specifically with this command. Hope this helps.

khaf commented 3 years ago

Can you share the values of your batch policy, and how many keys you retrieve per call?

devlo commented 3 years ago
policy := as.NewBatchPolicy()   
policy.SendKey = true

and how many keys you retrieve per call?

really small amount of keys for that queries, not more than 10 I think

And we are not experiencing this issue on v4.5, only on v5.0.1 with the same DB version and the same code.

khaf commented 3 years ago

This seems to stem from the new error handling. The v4 client never chained errors and was happy to return no error for commands which encountered them but passed on retry. While the new client still mostly does that, it wasn't doing it for Batch requests. What you were encountering was that the client would return an error to you indicating that the connection pool is empty. The new v5.0.2 release should mostly fix the issue, but you may still hit that problem if you do not warm up the client before using it. To do that, call client.WarmUp(0) after you connect to the cluster to make sure the connection pool is filled up before putting any load on it.

khaf commented 3 years ago

I released v5.1.0 yesterday which should also remove another class of these errors. Do you still encounter them?