aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
430 stars 198 forks source link

BatchGet(): Requested to read 8 bytes, but 0 was read. (EOF) #300

Closed xqzhang2015 closed 4 years ago

xqzhang2015 commented 4 years ago

We encounter such an error when executing BatchGet() but have no idea about it.

golang client code

        policy := &as.ClientPolicy{
            AuthMode:                    as.AuthModeInternal,
            Timeout:                     1 * time.Second,
            IdleTimeout:                 60 * time.Second,
            LoginTimeout:                10 * time.Second,
            ConnectionQueueSize:         1024,
            LimitConnectionsToQueueSize: true,
            FailIfNotConnected:          false,
            TendInterval:                time.Second,
            IgnoreOtherSubnetAliases:    false,
        }
        policy := as.NewBatchPolicy()
        policy.TotalTimeout = timeout

Environment

root@d4d0eac1f442:/regression/regression/regression# cat /proc/version
Linux version 3.10.0-493.el7.x86_64 (mockbuild@x86-020.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-9) (GCC) ) #1 SMP Tue Aug 16 11:45:26 EDT 2016

aerospike cluster

asd --version
Aerospike Enterprise Edition build 4.0.0.6

aerospike client

github.com/aerospike/aerospike-client-go v2.9.0+incompatible
khaf commented 4 years ago

I don't know if there is any correlation with the idleTimeout, but will investigate. How many errors do you encounter per minute?

xqzhang2015 commented 4 years ago

@khaf thanks for your quick reply. It happened randomly.

Actually the Golang client is under the testing phase for us and not released to PROD yet. Previously we use C++ client and didn't encounter similar issue.

xqzhang2015 commented 4 years ago

I noticed the error is from buffered_connection.go, but don't have too much knowledge about what will cause such error.

// readConn will read the minimum minLength number of bytes from the connection.
// It will read more if it has extra empty capacity in the buffer.
func (bc *bufferedConn) readConn(minLength int) error {
    // Corrupted data streams can result in a huge minLength.
    // Do a sanity check here.
    if minLength > MaxBufferSize || minLength <= 0 || minLength > bc.remaining {
        return NewAerospikeError(PARSE_ERROR, fmt.Sprintf("Invalid readBytes length: %d", minLength))
    }

    bc.shiftContentToHead(minLength)

    toRead := bc.remaining
    if ec := bc.emptyCap(); toRead > ec {
        toRead = ec
    }

    n, err := bc.conn.Read(bc.buf()[bc.tail:], toRead)
    bc.tail += n
    bc.remaining -= n

    if err != nil {
        return fmt.Errorf("Requested to read %d bytes, but %d was read. (%v)", minLength, n, err)
    }

    return nil
}
xqzhang2015 commented 4 years ago

@khaf Maybe I have found the root cause, which is IdleTimeout.

IdleTimeout is corresponding to max_socket_idle in the C library. In as_config.h:

    /**
     * Maximum socket idle time in seconds.  Connection pools will discard sockets that have
     * been idle longer than the maximum.  The value is limited to 24 hours (86400).
     *
     * It's important to set this value to a few seconds less than the server's proto-fd-idle-ms
     * (default 60000 milliseconds or 1 minute), so the client does not attempt to use a socket
     * that has already been reaped by the server.
     *
     * Connection pools are now implemented by a LIFO stack.  Connections at the tail of the
     * stack will always be the least used.  These connections are checked for maxSocketIdle
     * once every 30 tend iterations (usually 30 seconds).
     *
     * Default: 55 seconds
     */
    uint32_t max_socket_idle;

Previously we set the IdleTimeout as the 60s, and aerospike cluster side is also the default 60s.

After sleep with 60s, in my test, the aerospike client sends 2nd request(BatchGet), but the corresponding connection is closed at the same time on the server-side.

So the client got read EOF error for connection. After set IdleTimeout as 55s, the issue is solved.

We'd better also copy the comment from the C library to Golang client.

khaf commented 4 years ago

Thanks for your report back here. I was attacking this problem from a different angle, so glad to see it was configuration issue. I'll include the comments from the C client in the next release of the Go client for more clarity.

khaf commented 4 years ago

Clarified in the comments in v2.11.0.