aerospike / aerospike-client-c

Aerospike C Client
Other
98 stars 103 forks source link

handling of server response after total_timeout has happened. #88

Closed mdarade closed 5 years ago

mdarade commented 5 years ago

Hi, I get segfault with below stack trace.

(gdb) bt
#0 as_uv_command_buffer (handle=0x7facf80c6120, suggested_size=65536, buf=0x7fad508ef9d0) at src/main/aerospike/as_event_uv.c:256
#1 0x00007fadafb05d43 in uv__read (stream=0x7facf80c6120) at /root/disk/libuv/src/unix/stream.c:1161
#2 0x00007fadafb063ff in uv__stream_io (loop=0x7facf80008c0, w=0x7facf80c61a8, events=1) at /root/disk/libuv/src/unix/stream.c:1324
#3 0x00007fadafb0c1bb in uv__io_poll (loop=0x7facf80008c0, timeout=0) at /root/disk/libuv/src/unix/linux-core.c:378
#4 0x00007fadafaf660c in uv_run (loop=0x7facf80008c0, mode=UV_RUN_DEFAULT) at /root/disk/libuv/src/unix/core.c:370
#5 0x00007fadafd56b39 in as_uv_worker (udata=0x7fad2b7fcde0) at src/main/aerospike/as_event_uv.c:160
#6 0x00007fadb12956ba in start_thread (arg=0x7fad508f3700) at pthread_create.c:333
#7 0x00007fadae03141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) 

Lets consider following scenario.

client has set total_timeout to 50 msec. server sleeps/takes 1 sec for transaction completion before putting response over socket. libuv’s uv__read() would do alloc_cb which results in call to as_uv_command_buffer().

aerospike client should in this case first check whether as_uv_connection_alive() & then only should derefence cmd->buf.

BrianNichols commented 5 years ago

On command timeout, the connection and timer are closed in this order.

uv_close((uv_handle_t*)&conn->socket, as_uv_connection_closed);
uv_close((uv_handle_t*)&cmd->timer, as_uv_timer_closed);

The connection is freed when libuv calls as_uv_connection_closed(). The command buffer is freed when libuv calls as_uv_timer_closed().

I had assumed libuv would call as_uv_connection_closed() and as_uv_timer_closed() in order in the next iteration of the event loop. After as_uv_connection_closed() is called, libuv will not send buffer allocation requests to as_uv_command_buffer().

Your stack trace implies that the connection was still in closing state (as_uv_connection_closed() not called yet) after as_uv_timer_closed() was called (out of order).

I'm willing to account for this new unexpected behavior, but there is one thing that is bothering me. I modified the server to delay 1 second before returning results and set a "total_timeout" of 50ms on the client to force a timeout. On all iterations, the timeouts occur, but I can't duplicate your stacktrace, nor does valgrind show any "use after free" errors.

How easy is it for you to duplicate this stack trace?

Is it possible to provide code that duplicates the stack trace?

What OS, libuv and Aerospike client versions are you using?

mdarade commented 5 years ago

ubuntu 16.04 libuv 1.8.0-1 aerospike-client-c 4.3.8 We have 4 event loops. Will try to provide code which can duplicate stack trace.

BrianNichols commented 5 years ago

C client 4.5.0 has been released which checks as_uv_connection_alive() in as_uv_command_buffer() and as_uv_auth_command_buffer() :

https://www.aerospike.com/download/client/c/4.5.0