aerospike / aerospike-client-c

Aerospike C Client
Other
98 stars 104 forks source link

Crash on connection cancel (4.6.14) #96

Closed yoori closed 4 years ago

yoori commented 4 years ago

C Library periodically crash on high load (50K rps) with stack:

[Thread debugging using libthread_db enabled]
Program terminated with signal 11, Segmentation fault.
#0  cancel_connection (cmd=cmd@entry=0x7fdb381ef880, err=err@entry=0x7fdb9e77a060,
    source=source@entry=3, retry=retry@entry=false, timeout=timeout@entry=true)
    at src/main/aerospike/as_pipe.c:112
112             conn->canceling = true;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.0.1.el7.x86_64 keyutils-libs-1.5.8-3.                                                                                                                                     el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libev-4.15-3.el7.x86_64 li                                                                                                                                     bevent-2.0.21-4.el7.x86_64 libgcc-4.8.5-39.0.1.el7.x86_64 libicu-50.1.2-17.el7.x86_64 libselinux-2.5-14.                                                                                                                                     1.el7.x86_64 libstdc++-4.8.5-39.0.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.0.1.e                                                                                                                                     l7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  cancel_connection (cmd=cmd@entry=0x7fdb381ef880, err=err@entry=0x7fdb9e77a060,
    source=source@entry=3, retry=retry@entry=false, timeout=timeout@entry=true)
    at src/main/aerospike/as_pipe.c:112
#1  0x00000000008a97a5 in as_pipe_timeout (cmd=cmd@entry=0x7fdb381ef880, retry=retry@entry=false)
    at src/main/aerospike/as_pipe.c:450
#2  0x000000000089b6b6 in as_event_total_timeout (cmd=0x7fdb381ef880)
    at src/main/aerospike/as_event.c:767
#3  0x000000000089b9a2 in as_event_execute_retry (cmd=0x7fdb381ef880)
    at src/main/aerospike/as_event.c:863
#4  0x00007fdbae2ef515 in ev_invoke_pending () from /lib64/libev.so.4
#5  0x00007fdbae2f26b7 in ev_run () from /lib64/libev.so.4
#6  0x000000000089cfcb in ev_loop (flags=0, loop=0x136c5200) at /usr/local/include/ev.h:835
#7  as_ev_worker (udata=0x136c5200) at src/main/aerospike/as_event_ev.c:98
#8  0x00007fdbaba70ea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fdbaad638cd in clone () from /lib64/libc.so.6
(gdb)

At frame 0, cmd->conn is null pointer cmd variable content from other core file (with same stack):

(gdb) p *cmd
$2 = {timer = {active = 0, pending = 0, priority = 0, data = 0x7f7be063cf10, cb = 0x89d548 <as_ev_retry>, at = -0.019968405365943909,
    repeat = 0}, total_deadline = 15387285539, socket_timeout = 20, max_retries = 2, iteration = 1, replica = AS_POLICY_REPLICA_SEQUENCE,
  event_loop = 0x1338ec10, conn = 0x0, cluster = 0x18c4cba0, node = 0x18c50450, ns = 0x18c72740 "ssd", partition = 0x18c76440,
  udata = 0x7f7be0491e30, parse_results = 0x89a8ea <as_event_command_parse_result>, pipe_listener = 0x6b8d30
     <as::pipeline_listener(void*, as_event_loop*)>, pipe_link = {next = 0x0, prev = 0x0}, buf = 0x7f7be063d03f "id\026",
  command_sent_counter = 0, write_offset = 208, write_len = 95, read_capacity = 3793, len = 8, pos = 0, type = 1 '\001', proto_type = 3 '\003',
  proto_type_rcv = 0 '\000', state = 3 '\003', flags = 6 '\006', flags2 = 1 '\001'}
BrianNichols commented 4 years ago

On socket failure, the old connection is closed, set to NULL and a retry is signaled. A new change was made to wait one event loop iteration before executing retry. This means a timeout can occur between retry signal and actual retry. The regular connection timeout logic handles NULL connections, but the pipeline connection timeout logic does not.

We are investigating a fix.

BrianNichols commented 4 years ago

C client 4.6.16 has been released:

https://www.aerospike.com/download/client/c/4.6.16/

Let us know if it fixes your crash.