aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
429 stars 199 forks source link

Panic .updateDeadline #352

Closed un000 closed 3 years ago

un000 commented 3 years ago

Server version 5.6.0.5

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x76c18b]

github.com/aerospike/aerospike-client-go/v5.(*Connection).updateDeadline(0xc024aff2c0, 0xc01397f1a8, 0xc0172e7418)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:281 +0xab
github.com/aerospike/aerospike-client-go/v5.(*Connection).Read(0xc024aff2c0, 0xc032140000, 0x10000, 0x10000, 0x8, 0x7c4bd0, 0x1b3aae0, 0xfeca40)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:214 +0x45
github.com/aerospike/aerospike-client-go/v5.(*readCommand).parseResult(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc024aff2c0, 0x0, 0x0)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/read_command.go:85 +0x6e
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc1392a0cf0, 0xc1392a0c00, 0xc025c5716206aeb8, 0x4e8c40653, 0x1b56a80, 0x0, 0x0, ...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:2128 +0xc0f
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc1373ebd00, 0x7, 0x7)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:1970 +0xda
github.com/aerospike/aerospike-client-go/v5.(*operateCommand).Execute(...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/operate_command.go:79
github.com/aerospike/aerospike-client-go/v5.(*Client).Operate(0xc000142540, 0xc1392a0cf0, 0xc1373cd380, 0xc1373ebdc0, 0x7, 0x7, 0x0, 0xc1373ebdc0, 0x4)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/client.go:481 +0x1b2
    aerospikeHosts, err := aerospike.NewHosts(c.config.Addresses...)
    if err != nil {
        return err
    }

    cp := aerospike.NewClientPolicy()
    cp.Timeout = 5 * time.Second
    cp.IdleTimeout = 30 * time.Second
    cp.ConnectionQueueSize = 2048
    cp.MaxErrorRate = 8000
    cp.ErrorRateWindow = 5
    cp.MinConnectionsPerNode = 1024

    client, err := aerospike.NewClientWithPolicyAndHost(cp, aerospikeHosts...)
    if err != nil {
        return fmt.Errorf("error creating aerospike client: %w", err)
    }
    c.client = client

    conns := cp.MinConnectionsPerNode
    _, err = c.client.WarmUp(conns)
    if err != nil {
        return fmt.Errorf("error warming up connections: %w", err)
    }
khaf commented 3 years ago

The error here is coming from a read command (e.g. client.Get(...)) and is not related to the code snippet posted. Can you please paste the panic message (before the first line of the stack trace)?

ps. Seems it is from an Operate call. Can you post the operate call snippet?

un000 commented 3 years ago

But there was the only one panic, right after an application started. After millions of requests it doesn't happen again.

The panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x76c18b]

goroutine 13426 [running]:
github.com/aerospike/aerospike-client-go/v5.(*Connection).updateDeadline(0xc024aff2c0, 0xc01397f1a8, 0xc0172e7418)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:281 +0xab
github.com/aerospike/aerospike-client-go/v5.(*Connection).Read(0xc024aff2c0, 0xc032140000, 0x10000, 0x10000, 0x8, 0x7c4bd0, 0x1b3aae0, 0xfeca40)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:214 +0x45
github.com/aerospike/aerospike-client-go/v5.(*readCommand).parseResult(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc024aff2c0, 0x0, 0x0)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/read_command.go:85 +0x6e
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc1392a0cf0, 0xc1392a0c00, 0xc025c5716206aeb8, 0x4e8c40653, 0x1b56a80, 0x0, 0x0, ...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:2128 +0xc0f
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc1392b2000, 0x142ce58, 0xc1392b2000, 0xc1373ebd00, 0x7, 0x7)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:1970 +0xda
github.com/aerospike/aerospike-client-go/v5.(*operateCommand).Execute(...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/operate_command.go:79
github.com/aerospike/aerospike-client-go/v5.(*Client).Operate(0xc000142540, 0xc1392a0cf0, 0xc1373cd380, 0xc1373ebdc0, 0x7, 0x7, 0x0, 0xc1373ebdc0, 0x4)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/client.go:481 +0x1b2
        res, err := c.aero.Operate(
            context.Background(),
            "dmp.save_event",
            key, c.ttl, aero.UPDATE,
            aerospike.MapPutOp(
                c.mp,
                "cri",
                123, c.emptyList,
            ),
            aerospike.ListSetOrderOp("cri", aerospike.ListOrderOrdered, ctxKey),
            aerospike.ListAppendWithPolicyContextOp(c.lp, "cri" , []*aerospike.CDTContext{ctxKey}, now),
            aerospike.ListRemoveByValueRangeOp(
                "cri", aerospike.ListReturnTypeNone,
                nil, now-86400,
                ctxKey,
            ),
            aerospike.MapRemoveByRankRangeOp("cri", 512, aerospike.MapReturnType.NONE),
            aerospike.MapRemoveByValueOp("cri", c.emptyList, aerospike.MapReturnType.NONE),
            aerospike.GetBinOp("cri"),
        )

We've got this after upgrading from 4.x to 5.x.

un000 commented 3 years ago

it happens more often than once image

un000 commented 3 years ago

And one more recoverable panic.

runtime error: invalid memory address or nil pointer dereference

runtime.gopanic
            /usr/local/go/src/runtime/panic.go:965
runtime.panicmem
            /usr/local/go/src/runtime/panic.go:212
runtime.sigpanic
            /usr/local/go/src/runtime/signal_unix.go:734
github.com/aerospike/aerospike-client-go/v5.(*Connection).updateDeadline
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:281
github.com/aerospike/aerospike-client-go/v5.(*Connection).Read
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/connection.go:214
github.com/aerospike/aerospike-client-go/v5.(*readCommand).parseResult
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/read_command.go:85
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:2128
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:1970
github.com/aerospike/aerospike-client-go/v5.(*readCommand).Execute
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/read_command.go:264
github.com/aerospike/aerospike-client-go/v5.(*Client).GetObject
            /go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/client_reflect.go:74
github.com/valyala/fasthttp.(*workerPool).getCh.func1

GetObject struct:


struct {
  Cri map[int64]int64 `json:"cri"`
}
khaf commented 3 years ago

I have no idea why this error happens, since it should be impossible to happen. Any ideas regards the state of the cluster when it happens? Do you have a lot of migrations?

un000 commented 3 years ago

I have no idea why this error happens, since it should be impossible to happen. Any ideas regards the state of the cluster when it happens? Do you have a lot of migrations?

Nope a cluster is stable. Requests are uniform.

Maybe we have a race condition with the following parameter:

cp.MinConnectionsPerNode = 1024

We have set it after upgrading to the 5.x lib. MinConnectionsPerNode fixed our latency issues. Also we have upgraded aerospike from 4.8 -> 5.6 at the same time.

image

un000 commented 3 years ago

Another one, when a node was removed from the cluster.

{"level":"info","ts":"2021-06-02T13:36:36.851Z","caller":"aerospike/logger.go:19","msg":"Node `BB91317BC645DD4 10.10.2.110:3000` refresh failed: `ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: network error. Checked the wrapped error for detail\nResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: network error. Checked the wrapped error for detail\nread tcp 10.233.72.152:53010->10.10.2.110:3000: read: connection reset by peer`","version":"ef24336b","system":"aerospike"}
{"level":"info","ts":"2021-06-02T13:36:36.859Z","caller":"aerospike/logger.go:19","msg":"Tend finished. Live node count changes from 4 to 3","version":"ef24336b","system":"aerospike"}
{"level":"info","ts":"2021-06-02T13:36:36.859Z","caller":"aerospike/logger.go:19","msg":"Tending took 2.185429396s, while your requested ClientPolicy.TendInterval is 1s. Tends are slower than the interval, and may be falling behind the changes in the cluster.","version":"ef24336b","system":"aerospike"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x76906f]

goroutine 61673727 [running]:
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).executeAt(0xc148e9a6c0, 0x142d268, 0xc148e9a6c0, 0xc1381b79e0, 0xc1381b7901, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:2041 +0x2ef
github.com/aerospike/aerospike-client-go/v5.(*baseCommand).execute(0xc148e9a6c0, 0x142d268, 0xc148e9a6c0, 0xc0115a4901, 0xc11c1144f0, 0x0)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/command.go:1970 +0xda
github.com/aerospike/aerospike-client-go/v5.(*baseMultiCommand).execute(...)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/multi_command.go:366
github.com/aerospike/aerospike-client-go/v5.(*scanPartitionCommand).Execute(0xc148e9a6c0, 0x0, 0xc03da3e768)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/scan_partition_command.go:65 +0x47
github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute.func1(0xc054adaa00, 0x142d268, 0xc148e9a6c0)
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/werrgroup.go:62 +0xc6
created by github.com/aerospike/aerospike-client-go/v5.(*werrGroup).execute
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v5@v5.0.2/werrgroup.go:55 +0xaf
khaf commented 3 years ago

The last issue is related to ticket #353 . I have already found and fixed it.

un000 commented 3 years ago

Something happens when one of the servers goes down. Tested with restarts. 3-5 of 10 random clients panic.

khaf commented 3 years ago

This took unusually long to figure out, since I was looking in the wrong place. The actual reason was shadowing of errors. Fixed in v5.1.0.

un000 commented 3 years ago

Thank you! I've upgraded and will be watching for a some period for errors.

un000 commented 3 years ago

Looks nice. Panics gone away.