aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
430 stars 199 forks source link

command execution timed out in go-client when Docker container is restarted #405

Open mikhailbolshakov opened 1 year ago

mikhailbolshakov commented 1 year ago

Hello, I have deployed aerospike:ce-6.3.0.1 under docker-compose with the aerospike.conf

service {

}

logging {
    file /var/log/aerospike/aerospike.log {
            context any info
        }
    console {
        context any info
    }
}

network {
    service {
        address any
        port 3000
    }

    heartbeat {
        mode mesh
        address local
        port 3002
        interval 150
        timeout 10
    }

    fabric {
        address local
        port 3001
    }

}

namespace cache {
    default-ttl 30d # use 0 to never expire/evict.
    memory-size 4G
    nsup-period 120
    replication-factor 1
    storage-engine memory
}

I'm using go-client v6.12.0 in my go aplication also deployed as docker container in the same docker network. When my go application container restarts due to redeployment I start getting the following error on go-client side on just creating a new client connection

clientPolicy := aero.NewClientPolicy()
    client, err := aero.NewClientWithPolicy(clientPolicy, cfg.Host, cfg.Port)
    if err != nil {
        return ErrAeroConn(err, ctx)
    }

client logs look like this

2023-06-11T17:04:55.930+0000 Alias 172.18.0.3:3000 failed: ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: network error. Checked the wrapped error for detail
ResultCode: TIMEOUT, Iteration: 0, InDoubt: false, Node: <nil>: Timeout
read tcp 172.18.0.30:55270->172.18.0.3:3000: i/o timeout
2023-06-11T17:04:55.930+0000 Seed 172.18.0.3:3000 failed: ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: network error. Checked the wrapped error for detail
ResultCode: TIMEOUT, Iteration: 0, InDoubt: false, Node: <nil>: Timeout
read tcp 172.18.0.30:55270->172.18.0.3:3000: i/o timeout
2023-06-11T17:04:55.930+0000 ResultCode: INVALID_NODE_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: Failed to connect to hosts: [172.18.0.3:3000]
ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: <nil>: network error. Checked the wrapped error for detail
ResultCode: TIMEOUT, Iteration: 0, InDoubt: false, Node: <nil>: Timeout
read tcp 172.18.0.30:55270->172.18.0.3:3000: i/o timeout
2023-06-11T17:04:55.931+0000 No nodes available; seeding...
2023-06-11T17:04:55.932+0000 Seeding the cluster. Seeds count: 1
2023-06-11T17:04:55.932+0000 Node Validator has 1 nodes and they are: [172.18.0.3:3000]
2023-06-11T17:05:10.922+0000 Failed to connect to host(s): [aerospike:3000]; error: ResultCode: TIMEOUT, Iteration: 0, InDoubt: false, Node: <nil>: command execution timed out on client: See 
ResultCode: TIMEOUT, Iteration: 0, InDoubt: false, Node: <nil>: command execution timed out on client: See Policy.Timeout

Error is reproduced until aerospike container is restarted.

khaf commented 1 year ago

What is the IP of the newly restarted container? Is that IP the same as the seed you provided? Can you ssh into your client container and ping the server container?

mikhailbolshakov commented 1 year ago

here docker inspect of aerospike container

[
    {
        "Id": "d1d67eeb96050773be3cdd94aa17db0bc162269be7c35a49a705e8223a00e097",
        "Created": "2023-06-07T09:22:06.585535635Z",
        "Path": "/usr/bin/as-tini-static",
        "Args": [
            "-r",
            "SIGUSR1",
            "-t",
            "SIGTERM",
            "--",
            "/entrypoint.sh",
            "--config-file",
            "/opt/aerospike/etc/aerospike.conf"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3128355,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2023-06-13T09:37:37.889941354Z",
            "FinishedAt": "2023-06-13T09:36:53.955858771Z"
        },
        "Image": "sha256:dc5db6293cbe39fa7edfdc14cb4cf3998e5d6a48a871d06187df8f2147aaf3b6",
        "ResolvConfPath": "/var/lib/docker/containers/d1d67eeb96050773be3cdd94aa17db0bc162269be7c35a49a705e8223a00e097/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/d1d67eeb96050773be3cdd94aa17db0bc162269be7c35a49a705e8223a00e097/hostname",
        "HostsPath": "/var/lib/docker/containers/d1d67eeb96050773be3cdd94aa17db0bc162269be7c35a49a705e8223a00e097/hosts",
        "LogPath": "",
        "Name": "/prod-aerospike",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "compose_spike-logs:/var/log/aerospike:rw",
                "/opt/services/compose/aerospike/aerospike.conf:/opt/aerospike/etc/aerospike.conf:rw",
                "/var/ev2go/docker/volumes/aerospike/data:/opt/aerospike/data:rw"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "syslog",
                "Config": {
                    "syslog-address": "tcp://localhost:5141"
                }
            },
            "NetworkMode": "prod",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "always",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "ConsoleSize": [
                0,
                0
            ],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/a09570f596feb69e3eb1b31fbb41dd4136b9be3b9a55789c88f649514e7cd1c6-init/diff:/var/lib/docker/overlay2/11e226e4130e48c7b825974e8d33a94f6b4d5f1eb1c7d355a044c04326506a4a/diff:/var/lib/docker/overlay2/d9a53a3e2ad2ffa735a285e4e76100d5643c1c61a9aaf889de9feeb93b1c9703/diff:/var/lib/docker/overlay2/5b6d25baf452df16024fca8c0751e848c32a84d1e78e99ca43bb850b2e5f3b84/diff:/var/lib/docker/overlay2/73c8091e9d806858c31290df93c752366cc8fefe608e2f6f2f3a209489a611a6/diff",
                "MergedDir": "/var/lib/docker/overlay2/a09570f596feb69e3eb1b31fbb41dd4136b9be3b9a55789c88f649514e7cd1c6/merged",
                "UpperDir": "/var/lib/docker/overlay2/a09570f596feb69e3eb1b31fbb41dd4136b9be3b9a55789c88f649514e7cd1c6/diff",
                "WorkDir": "/var/lib/docker/overlay2/a09570f596feb69e3eb1b31fbb41dd4136b9be3b9a55789c88f649514e7cd1c6/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/ev2go/docker/volumes/aerospike/data",
                "Destination": "/opt/aerospike/data",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/opt/services/compose/aerospike/aerospike.conf",
                "Destination": "/opt/aerospike/etc/aerospike.conf",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "compose_spike-logs",
                "Source": "/var/lib/docker/volumes/compose_spike-logs/_data",
                "Destination": "/var/log/aerospike",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "d1d67eeb9605",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "3000/tcp": {},
                "3001/tcp": {},
                "3002/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "NAMESPACE=cache",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "--config-file",
                "/opt/aerospike/etc/aerospike.conf"
            ],
            "Image": "prod.dev:5050/reg/aerospike:ce-6.3.0.1",
            "Volumes": {
                "/opt/aerospike/data": {},
                "/opt/aerospike/etc/aerospike.conf": {},
                "/var/log/aerospike": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "/usr/bin/as-tini-static",
                "-r",
                "SIGUSR1",
                "-t",
                "SIGTERM",
                "--",
                "/entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.compose.config-hash": "a4fd0c71d15d60d8c1b49745120f12a003a7ea7d842fa55ffbb7a4cbfa3cb4dd",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "compose",
                "com.docker.compose.project.config_files": "docker-compose.infra.yml",
                "com.docker.compose.project.environment_file": "env/prod/infra.env",
                "com.docker.compose.project.working_dir": "/opt/services/compose",
                "com.docker.compose.service": "aerospike",
                "com.docker.compose.version": "1.28.6"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "1660a77ff6a58cb73c3f00c611ecfca11926b0b5e92e3d3a66a2304c96f8bc54",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "3000/tcp": null,
                "3001/tcp": null,
                "3002/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/1660a77ff6a5",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "prod": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "aerospike",
                        "d1d67eeb9605"
                    ],
                    "NetworkID": "1992039b92f260505173794e1328e89ca4be6e9ffca0721d96da96c21e7d3c20",
                    "EndpointID": "1c9ca009d77d02edcd991dd7f58b6f35ffe0b2ebab42bb0fbba24e96c8fa7cb0",
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.32",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "**********",
                    "DriverOpts": null
                }
            }
        }
    }
]

here docker inspect of our app container

[
    {
        "Id": "88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22",
        "Created": "2023-06-13T09:29:34.024007382Z",
        "Path": "./bin/main",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3131351,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2023-06-13T09:38:00.840956627Z",
            "FinishedAt": "2023-06-13T09:37:58.303078528Z"
        },
        "Image": "sha256:4087882da24758d99ab6b63f0e71c32e5f6078828f9dcba2633f706f20d39803",
        "ResolvConfPath": "/var/lib/docker/containers/88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22/hostname",
        "HostsPath": "/var/lib/docker/containers/88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22/hosts",
        "LogPath": "/var/lib/docker/containers/88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22/88440de5cfb14a28945c81a8a2dc717b06ae69c53d0bd6531ffa9a715a556f22-json.log",
        "Name": "/ev2go-charging",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "ev2go",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "always",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "ConsoleSize": [
                0,
                0
            ],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/b7181f937109308de48b9dc4374b346219618872005dd188fd638da2b60f8be0-init/diff:/var/lib/docker/overlay2/c6ad0d26880be3a153195cb09ae0c54dc86f94a015b50135e989c789d3fd7006/diff:/var/lib/docker/overlay2/449acbcf25e7bdd9fbd524bc9732f677a843e9502ade8ccb231f1680ee9447c5/diff:/var/lib/docker/overlay2/3b81eddb7cae2c9bdcee1b597c5c2925b725b30448a28744e7c1c7621d6cc39f/diff:/var/lib/docker/overlay2/91d410e7ceb084faac50ca05bd2cfcf8d0172d397b97dcaf1dcda02561ad4c72/diff:/var/lib/docker/overlay2/8e65a8ebbffa204cf8e19d3a944576735eb9d02fd2c7c243a6873fe45dcc54e4/diff:/var/lib/docker/overlay2/319ecfff57b841c1c44bcee9f831fb36efe1b51fef144609c8f73aa419a0ec3c/diff",
                "MergedDir": "/var/lib/docker/overlay2/b7181f937109308de48b9dc4374b346219618872005dd188fd638da2b60f8be0/merged",
                "UpperDir": "/var/lib/docker/overlay2/b7181f937109308de48b9dc4374b346219618872005dd188fd638da2b60f8be0/diff",
                "WorkDir": "/var/lib/docker/overlay2/b7181f937109308de48b9dc4374b346219618872005dd188fd638da2b60f8be0/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [],
        "Config": {
            "Hostname": "*******",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "AERO_HOST=aerospike",
                "AERO_PORT=3000",
            ],
            "Cmd": null,
            "Image": "prod.dev:5050/prod/server/app:v1.1.4",
            "Volumes": null,
            "WorkingDir": "/usr/local/root/app",
            "Entrypoint": [
                "./bin/main"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.compose.config-hash": "cedfa1cc76acedc141be90334ed503fc33665621f85635b71233e3f7a190ec94",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "compose",
                "com.docker.compose.project.config_files": "docker-compose.services.yml",
                "com.docker.compose.project.working_dir": "/opt/services/compose",
                "com.docker.compose.service": "charging",
                "com.docker.compose.version": "1.28.6",
                "logging": "promtail",
                "logging_jobname": "containerlogs"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "8b5bfc8bfdf558e9611563c54646a14e65d89d17ee5a9dc4393407940a97d2ef",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/8b5bfc8bfdf5",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "prod": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "app",
                        "88440de5cfb1"
                    ],
                    "NetworkID": "1992039b92f260505173794e1328e89ca4be6e9ffca0721d96da96c21e7d3c20",
                    "EndpointID": "0be7434bfa5b623b41ae0a468c469aaae7ca39cd1ebc05a15417039e80924577",
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.30",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "************",
                    "DriverOpts": null
                }
            }
        }
    }
]

IP of a new container might be different of course.

Here some more points of investigation which might be helpful: the problem is reproduced on 2 of our 3 environments. On those environments where problem exists we witness very weird scenario:

On the environment where there is no problem the app container can be restarted without any problems

Unfortunately, I cannot play around with this error because it's reproduced on prod environment while not reproduced on dev. But if you have some ideas to check you could try them in the downtime

Can you ssh into your client container and ping the server container nc shows open connection is ok