NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.03k stars 348 forks source link

Unbound-1.13.1 crashed by SIGABRT #469

Closed iruzanov closed 1 year ago

iruzanov commented 3 years ago

Hello, Wouter!

I am actively using unbound-1.13.1 (with our DNSTAP patches, issue #367). And sometimes my unbound is crashing under highload, massive recursive TCP-requests. Any abnormal terminations caused by services/outside_network.c code. And now i have one of such core dumps: (gdb) bt

0 0x0000000800955c2a in thr_kill () from /lib/libc.so.7

1 0x0000000800954084 in raise () from /lib/libc.so.7

2 0x00000008008ca279 in abort () from /lib/libc.so.7

3 0x0000000800464641 in ?? () from /usr/local/lib/libevent-2.1.so.7

4 0x0000000800464939 in event_errx () from /usr/local/lib/libevent-2.1.so.7

5 0x000000080045ec54 in evmap_iodel () from /usr/local/lib/libevent-2.1.so.7

6 0x0000000800457e8f in event_delnolock () from /usr/local/lib/libevent-2.1.so.7

7 0x000000080045ada8 in event_del () from /usr/local/lib/libevent-2.1.so.7

8 0x000000000030e25b in ub_event_del (ev=) at ./util/ub_event.c:395

9 comm_point_close (c=0xdc97b7c00) at ./util/netevent.c:3860

10 0x0000000000315bab in decommission_pending_tcp (outnet=, pend=0xdc9494980)

at ./services/outside_network.c:945

11 0x00000000003147d6 in reuse_cb_and_decommission (outnet=0x18e75, pend=0x6, error=-2)

at ./services/outside_network.c:986

12 0x0000000000317491 in outnet_tcptimer (arg=0xee67c2300) at ./services/outside_network.c:2033

13 0x000000080045e0ed in ?? () from /usr/local/lib/libevent-2.1.so.7

14 0x000000080045a09c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7

15 0x000000000024dc54 in thread_start (arg=0x8014c0800) at ./util/ub_event.c:280

16 0x0000000800780fac in ?? () from /lib/libthr.so.3

17 0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000 (gdb)

If we enter frame 12 (outnet_tcptimer) and do print pend structure, we will see the following: (gdb) print pend $15 = (struct pending_tcp ) 0x6 (gdb) print pend Cannot access memory at address 0x6 (gdb) And this corrupt pend structure is passing to reuse_cb_and_decommission() function (frame 11) and higher in the stacktrace output above.

In the outnet_tcptimer() function we can see the following code (in services/outside_network.c): / it was in use / struct pending_tcp pend=(struct pending_tcp)w->next_waiting;

But the structure w->next_waiting is of type waiting_tcp: (gdb) print w->next_waiting $18 = (struct waiting_tcp *) 0xdc9494980 (gdb)

So my question - is the types casting correct in outnet_tcptimer() function? And does this corrupt pend structure cause event_errx() in libevent? If it might help, i found structure of pending_tcp type in w structure: (gdb) print w->outnet->tcp_free
$23 = (struct pending_tcp ) 0xdc9494980 (gdb) (gdb) print w->outnet->tcp_free $24 = {next_free = 0xdc9493e40, pi = 0xd7da2c000, c = 0xdc97b7c00, query = 0x0, reuse = {node = {parent = 0xdc94953a0, left = 0x3287d0 , right = 0x3287d0 , key = 0x0, color = 1 '\001'}, addr = { ss_len = 0 '\000', ss_family = 2 '\002', ss_pad1 = "\000\065X\320\017\067", __ss_align = 0, ss_pad2 = "\000\000\000\000\000\000\000\016", '\000' <repeats 103 times>}, addrlen = 16, is_ssl = 0, lru_next = 0xdc9494ae0, lru_prev = 0x0, item_on_lru_list = 0, pending = 0xdc9494980, cp_more_read_again = 0, cp_more_write_again = 0, tree_by_id = {root = 0x3287d0 , count = 0, cmp = 0x3133e0 }, write_wait_first = 0x0, write_wait_last = 0x0, outnet = 0xd7d805000}} (gdb)

Big thank you in advance!

PS I did not send core-file itself because of 31GB in size of the file.

gthess commented 3 years ago

Hello,

Thanks for the backtrace! This seems to be the same as #411 and #439 and the backtrace will hopefully help narrow down the cause of the corruption. In this case w->next_waiting is supposed to be casted to struct pending_tcp* because of the w->on_tcp_waiting_list check above. So the error is not there per say; rather at an earlier point the w struct was corrupted.

gthess commented 3 years ago

~I also see the cause of the crash for the other issues, namely:~

reuse = {node = {parent = 0xdc94953a0,
left = 0x3287d0 <rbtree_null_node>, right = 0x3287d0 <rbtree_null_node>, key = 0x0, color = 1 '\001'}

~there is a node in the reuse tree with key=NULL. That is the case that shouldn't be happening in the first place.~

No scratch that, this is just a node and having key=NULL just means that this node is not part of the tree.

gthess commented 3 years ago

Btw, can you somehow reproduce the issue reliably-ish?

iruzanov commented 3 years ago

I can see in my case that the key=NULL too: (gdb) print w->outnet->tcp_free.reuse.node $27 = {parent = 0xdc94953a0, left = 0x3287d0 , right = 0x3287d0 , key = 0x0, color = 1 '\001'} (gdb)

iruzanov commented 3 years ago

By the way, i have another core dump where the key is NULL: Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00000000003133ea in reuse_id_cmp (key1=0xf194e0500, key2=0x0) at ./services/outside_network.c:174

174 if(w1->id < w2->id) [Current thread is 1 (LWP 100527)] (gdb) bt

0 0x00000000003133ea in reuse_id_cmp (key1=0xf194e0500, key2=0x0) at ./services/outside_network.c:174

1 0x00000000002c2f1a in rbtree_insert (rbtree=0xd810e9ab8, data=0xf194e0528) at ./util/rbtree.c:241

2 0x0000000000314b08 in reuse_tree_by_id_insert (reuse=0xd810e99e0, w=0xf194e0500)

at ./services/outside_network.c:400

3 use_free_buffer (outnet=0xc72751000) at ./services/outside_network.c:743

4 0x000000000031431f in outnet_tcp_cb (c=, arg=, error=,

reply_info=<optimized out>) at ./services/outside_network.c:1125

5 0x000000080045fd7d in ?? () from /usr/local/lib/libevent-2.1.so.7

6 0x000000080045bd2c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7

7 0x000000000024dc54 in thread_start (arg=0x8014c8000) at ./util/ub_event.c:280

8 0x0000000800783fac in ?? () from /lib/libthr.so.3

9 0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x7fffdf1f7000 (gdb) (gdb) fr 1

1 0x00000000002c2f1a in rbtree_insert (rbtree=0xd810e9ab8, data=0xf194e0528) at ./util/rbtree.c:241

241 if ((r = rbtree->cmp(data->key, node->key)) == 0) { (gdb) print *node
$14 = {parent = 0xf0686c928, left = 0x3287d0 , right = 0x3287d0 , key = 0x0, color = 0 '\000'} (gdb)

This node is appearing in while(node != RBTREE_NULL) loop (source is util/rbtree.c in rbtree_insert() function) and program is abnormally terminating when we call rbtree->cmp(data->key, node->key) in frame 0.

iruzanov commented 3 years ago

Btw, can you somehow reproduce the issue reliably-ish?

It would be hard. Unbound crashes under massive TCP-requests when server generating huge amount of iterative requests to upstreams. So i will need to research this traffic first to reproduce the issue in my sandbox.

gthess commented 3 years ago

There is a possible fix on master branch (https://github.com/NLnetLabs/unbound/commit/ff6b527184b33ffe1e2b643db8a32fae8061fc5a) for this. It would be great if you could test and provide feedback!

gthess commented 3 years ago

Hi @iruzanov, just checking if you were able to test with the aforementioned fix.

iruzanov commented 3 years ago

Hello colleagues! Please excuse me for long delay! I'm just very "loaded" by other tasks right now ;)

In will definitely test your patch on my load Unbound resolvers in this week. And then i will report you about results! Thank you very much for help!

iruzanov commented 3 years ago

Today i have patched one of my loaded Unbound resolvers. My plan is:

Thank you very much for big help!

iruzanov commented 3 years ago

Unbound patched on one of the most loaded servers is still running without any core dumps. Looks good. In days i'm going to patch still more loaded servers.

gthess commented 3 years ago

Thanks for letting us know, very good news! Normally would you expect a core dump by now?

iruzanov commented 3 years ago

Hi! The server works fine for 4 days without core dumps. Super! ;) And i'm going to apply your patch to list of another servers. In next week i will let you know about success! (i hope so ;)

iruzanov commented 3 years ago

I have patched still three Unbound resolvers that are running under heavy load. No one of the processes are crased with core dump. I would like you to give me still more time to observe the patched software and then i will let you know with my final conclusion.

iruzanov commented 3 years ago

Hello colleagues!

Unfortunately i have detected core dumps from two Unbound processes.

  1. The first core dump looks like:
    [New LWP 100571]
    [New LWP 148594]
    [New LWP 100646]
    [New LWP 101330]
    [New LWP 101361]
    [New LWP 101577]
    [New LWP 101603]
    [New LWP 101611]
    [New LWP 101613]
    [New LWP 101619]
    [New LWP 101620]
    [New LWP 101631]
    [New LWP 101649]
    [New LWP 101653]
    [New LWP 101672]
    [New LWP 101674]
    [New LWP 101689]
    [New LWP 101693]
    [New LWP 101702]
    [New LWP 101719]
    [New LWP 101721]
    Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x000000000031344a in reuse_id_cmp (key1=0x7fffdfffda68, key2=0x0) at ./services/outside_network.c:178
    178             if(w1->id < w2->id)
    [Current thread is 1 (LWP 100571)]
    (gdb) bt
    #0  0x000000000031344a in reuse_id_cmp (key1=0x7fffdfffda68, key2=0x0) at ./services/outside_network.c:178
    #1  0x00000000003156e9 in rbtree_find_less_equal (rbtree=0xdc7262a78, key=0x7fffdfffda68, result=<optimized out>) at ./util/rbtree.c:527
    #2  rbtree_search (rbtree=0xdc7262a78, key=0x7fffdfffda68) at ./util/rbtree.c:285
    #3  reuse_tcp_by_id_find (reuse=0xdc72629a0, id=6060) at ./services/outside_network.c:425
    #4  reuse_tcp_select_id (reuse=0xdc72629a0, outnet=<optimized out>) at ./services/outside_network.c:2116
    #5  0x0000000000314a38 in tcp_select_id (outnet=0xd40aa3000, reuse=0xdc72629a0) at ./services/outside_network.c:2092
    #6  use_free_buffer (outnet=0xd40aa3000) at ./services/outside_network.c:745
    #7  0x000000000031437f in outnet_tcp_cb (c=<optimized out>, arg=<optimized out>, error=<optimized out>, reply_info=<optimized out>)
    at ./services/outside_network.c:1150
    #8  0x000000080045e2b1 in ?? () from /usr/local/lib/libevent-2.1.so.7
    #9  0x000000080045a09c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7
    #10 0x000000000024dc84 in thread_start (arg=0x8014ba800) at ./util/ub_event.c:280
    #11 0x0000000800780fac in ?? () from /lib/libthr.so.3
    #12 0x0000000000000000 in ?? ()
    Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
    (gdb) list
    173
    174     int reuse_id_cmp(const void* key1, const void* key2)
    175     {
    176             struct waiting_tcp* w1 = (struct waiting_tcp*)key1;
    177             struct waiting_tcp* w2 = (struct waiting_tcp*)key2;
    178             if(w1->id < w2->id)
    179                     return -1;
    180             if(w1->id > w2->id)
    181                     return 1;
    182             return 0;

I.e. we have deal with the same issue related to key2=NULL:

(gdb) print *w2
Cannot access memory at address 0x0
  1. The core dump of second Unbound (on other server):
    [New LWP 101107]
    [New LWP 104932]
    [New LWP 100498]
    [New LWP 100601]
    [New LWP 100606]
    [New LWP 100622]
    [New LWP 100719]
    [New LWP 100915]
    [New LWP 100936]
    [New LWP 100997]
    [New LWP 101099]
    [New LWP 101113]
    [New LWP 101115]
    [New LWP 101128]
    [New LWP 101132]
    [New LWP 101138]
    [New LWP 101146]
    [New LWP 101161]
    [New LWP 101171]
    [New LWP 101178]
    [New LWP 101196]
    Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'.
    Program terminated with signal SIGABRT, Aborted.
    #0  0x0000000800955c2a in thr_kill () from /lib/libc.so.7
    [Current thread is 1 (LWP 101107)]
    
    (gdb) bt
    #0  0x0000000800955c2a in thr_kill () from /lib/libc.so.7
    #1  0x0000000800954084 in raise () from /lib/libc.so.7
    #2  0x00000008008ca279 in abort () from /lib/libc.so.7
    #3  0x0000000800464641 in ?? () from /usr/local/lib/libevent-2.1.so.7
    #4  0x0000000800464939 in event_errx () from /usr/local/lib/libevent-2.1.so.7
    #5  0x000000080045ec54 in evmap_io_del_ () from /usr/local/lib/libevent-2.1.so.7
    #6  0x0000000800457e8f in event_del_nolock_ () from /usr/local/lib/libevent-2.1.so.7
    #7  0x000000080045ada8 in event_del () from /usr/local/lib/libevent-2.1.so.7
    #8  0x000000000030e2bb in ub_event_del (ev=<optimized out>) at ./util/ub_event.c:395
    #9  comm_point_close (c=0xdd6d3cc00) at ./util/netevent.c:3860
    #10 0x000000000031601b in decommission_pending_tcp (outnet=<optimized out>, pend=0xdd6721d80) at ./services/outside_network.c:967
    #11 0x0000000000314836 in reuse_cb_and_decommission (outnet=0x18af3, pend=0x6, error=-2) at ./services/outside_network.c:1011
    #12 0x0000000000317901 in outnet_tcptimer (arg=0xe9a748500) at ./services/outside_network.c:2058
    #13 0x000000080045e0ed in ?? () from /usr/local/lib/libevent-2.1.so.7
    #14 0x000000080045a09c in event_base_loop () from /usr/local/lib/libevent-2.1.so.7
    #15 0x000000000024dc84 in thread_start (arg=0x801543000) at ./util/ub_event.c:280
    #16 0x0000000800780fac in ?? () from /lib/libthr.so.3
    #17 0x0000000000000000 in ?? ()
    Backtrace stopped: Cannot access memory at address 0x7fffdedf5000

12 0x0000000000317901 in outnet_tcptimer (arg=0xe9a748500) at ./services/outside_network.c:2058

2058 reuse_cb_and_decommission(outnet, pend, NETEVENT_TIMEOUT); (gdb) list 2053 waiting_tcp_callback(w, NULL, NETEVENT_TIMEOUT, NULL); 2054 waiting_tcp_delete(w); 2055 } else { 2056 / it was in use / 2057 struct pending_tcp pend=(struct pending_tcp)w->next_waiting; 2058 reuse_cb_and_decommission(outnet, pend, NETEVENT_TIMEOUT); 2059 } 2060 use_free_buffer(outnet); 2061 } 2062

(gdb) print (struct pending_tcp) pend Cannot access memory at address 0x6

(gdb) print (struct pending_tcp) w->next_waiting $11 = {next_free = 0xdd67214c0, pi = 0xd0184b0c0, c = 0xdd6d3cc00, query = 0x0, reuse = {node = {parent = 0xdd6721b20, left = 0x328890 , right = 0xdd67213a0, key = 0x0, color = 0 '\000'}, addr = {ss_len = 0 '\000', ss_family = 28 '\034', ss_pad1 = "\000\065\000\000\000", __ss_align = 79164837527840, ss_pad2 = "\000\000\000\000\000\000\000\001", '\000' <repeats 103 times>}, addrlen = 28, is_ssl = 0, lru_next = 0xdd6721ee0, lru_prev = 0x0, item_on_lru_list = 0, pending = 0xdd6721d80, cp_more_read_again = 0, cp_more_write_again = 0, tree_by_id = { root = 0x328890 , count = 0, cmp = 0x313440 }, write_wait_first = 0x0, write_wait_last = 0x0, outnet = 0xd017f4000}} (gdb)

iruzanov commented 3 years ago

I have applied the patch that is on resource: https://github.com/NLnetLabs/unbound/commit/ff6b527184b33ffe1e2b643db8a32fae8061fc5a

gthess commented 3 years ago

Oh, bad news but thanks for the information as always! I'll start looking into this again right away. Any chance that you can make out any similarities traffic-wise I guess between the two crashes?

iruzanov commented 3 years ago

I will dig Unbound logs. May be i will find some common points to describe under what traffic the program has crashed. Also i will recompile libevent and Unbound itself with -O0 flag. To get the maximum debug info.

gthess commented 3 years ago

Would you be able to share your config file? Also do you happen to issue unbound-control commands to unbound?

iruzanov commented 3 years ago
  1. My config file: server: verbosity: 1 extended-statistics: yes

    num-threads: 20
    
    interface: 127.0.0.1
    interface: ::1
    
     interface: 193.x.x.x
     interface: 2001:x:x:x:x:x:x:x
     interface: 62.x.x.x
     interface: 62.x.x.x
     interface: 195.x.x.x
     interface: 195.x.x.x
     interface: 2001:x:x::x
     interface: 2001:x:x::x
     interface: 2a0c:x:x::x
     interface: 2a0c:x:x::x

    outgoing-interface: 193.x.x.x outgoing-interface: 2001:x:x:x:x:x:x:x outgoing-range: 32768 outgoing-num-tcp: 2048 incoming-num-tcp: 1024

    so-reuseport: yes  # we are running under FreeBSD 12.2 with SO_REUSEPORT_LB
    
    msg-cache-size: 1024m
    msg-cache-slabs: 16
    
    num-queries-per-thread: 2048
    
    rrset-cache-size: 2048m
    rrset-cache-slabs: 16
    
    infra-cache-numhosts: 100000
    
    do-ip4: yes
    do-ip6: yes
    do-udp: yes
    do-tcp: yes
    do-daemonize: yes
    
    access-control: 0.0.0.0/0 allow
    access-control: 127.0.0.0/8 allow
    access-control: ::0/0 allow
    access-control: ::1 allow
    
    chroot: "/srvs/unbound"
    username: "bind"
    directory: "/srvs/unbound/etc/unbound"
    use-syslog: yes
    
    pidfile: "/srvs/unbound/var/run/unbound/unbound.pid"
    hide-identity: no
    hide-version: yes
    
    identity: "res1"
    
    aggressive-nsec: yes
    
    do-not-query-address: ::1
    do-not-query-localhost: no

    ratelimit: 1000 minimal-responses: no module-config: "respip validator iterator"

    auto-trust-anchor-file: "/srvs/unbound/etc/unbound/autokey/auto.key"
    
    neg-cache-size: 100m
    deny-any: yes

    serve-expired: yes serve-expired-ttl: 14400 prefetch: yes prefetch-key: yes rpz: name: "blocklist." # name of the policy zone master: 194.x.x.x # address of the name server to transfer from rpz: name: "operator." master: 194.x.x.x

python:

dnstap: dnstap-enable: yes dnstap-socket-path: "/tmp/dnstap.sock" dnstap-send-identity: yes dnstap-send-version: yes dnstap-identity: "res1" dnstap-log-client-query-messages: yes dnstap-log-client-response-messages: yes dnstap-log-forwarder-query-messages: no dnstap-log-forwarder-response-messages: no dnstap-log-resolver-query-messages: no dnstap-log-resolver-response-messages: no

remote-control: control-enable: yes control-interface: ::1 control-use-cert: "yes" server-key-file: "/srvs/unbound/etc/unbound/unbound_server.key" server-cert-file: "/srvs/unbound/etc/unbound/unbound_server.pem" control-key-file: "/srvs/unbound/etc/unbound/unbound_control.key" control-cert-file: "/srvs/unbound/etc/unbound/unbound_control.pem"

stub-zone: name: "." stub-prime: yes stub-addr: 127.12.12.12 stub-addr: 193.x.x.x

  1. unbound-control. Yes, we are running special programm named unbound_exporter that just like unbound-control connects via control TCP-socket to Unbound and send stats_noreset command to get statistics. This command is issued every minute.
iruzanov commented 3 years ago

What about traffic similarity - i have grep-ed Unbound logs on two servers. The log records (tons of the records) before Unbound was crashing to core dump are look like:

  1. first server: [...] Jun 18 18:50:07 res1 unbound[28935]: [28935:11] error: read (in tcp s): Connection refused for 2a01:5b40:ac3::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:1] error: read (in tcp s): Connection refused for 2a01:5b40:ac2::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:1] error: read (in tcp s): Connection refused for 151.249.125.2 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:11] error: read (in tcp s): Connection refused for 2a01:5b40:ac1::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:10] error: read (in tcp s): Connection refused for 151.249.125.2 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:c] error: read (in tcp s): Connection refused for 151.249.124.1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:9] error: read (in tcp s): Connection refused for 2a01:5b40:ac3::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:1] error: read (in tcp s): Connection refused for 2a01:5b40:ac3::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:a] error: read (in tcp s): Connection refused for 151.249.125.2 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:5] error: read (in tcp s): Connection refused for 2a01:5b40:ac2::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:1] error: read (in tcp s): Connection refused for 151.249.126.3 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:11] error: read (in tcp s): Connection refused for 2a01:5b40:ac2::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:1] error: read (in tcp s): Connection refused for 151.249.125.2 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:10] error: read (in tcp s): Connection refused for 2a01:5b40:ac3::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:11] error: read (in tcp s): Connection refused for 2a01:5b40:ac1::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:6] error: read (in tcp s): Connection refused for 2a01:5b40:ac3::1 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:5] error: read (in tcp s): Connection refused for 151.249.126.3 port 53 Jun 18 18:50:07 res1 unbound[28935]: [28935:a] error: read (in tcp s): Connection refused for 151.249.125.2 port 53 [...]

  2. second server: [...] Jun 20 15:52:42 res2 syslogd: last message repeated 1 times Jun 20 15:52:44 res2 unbound[14692]: [14692:11] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:44 res2 syslogd: last message repeated 1 times Jun 20 15:52:44 res2 unbound[14692]: [14692:6] error: read (in tcp s): Connection refused for 199.249.127.1 port 53 Jun 20 15:52:44 res2 syslogd: last message repeated 1 times Jun 20 15:52:46 res2 unbound[14692]: [14692:c] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:46 res2 syslogd: last message repeated 2 times Jun 20 15:52:47 res2 unbound[14692]: [14692:a] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:47 res2 syslogd: last message repeated 1 times Jun 20 15:52:48 res2 unbound[14692]: [14692:10] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:48 res2 syslogd: last message repeated 1 times Jun 20 15:52:49 res2 unbound[14692]: [14692:0] error: read (in tcp s): Connection refused for 199.249.120.1 port 53 Jun 20 15:52:49 res2 syslogd: last message repeated 1 times Jun 20 15:52:50 res2 unbound[14692]: [14692:e] error: read (in tcp s): Connection refused for 199.249.120.1 port 53 Jun 20 15:52:51 res2 syslogd: last message repeated 2 times Jun 20 15:52:53 res2 unbound[14692]: [14692:a] error: read (in tcp r): Connection reset by peer for 199.81.212.13 port 53 Jun 20 15:52:53 res2 unbound[14692]: [14692:10] notice: ratelimit exceeded corp.ds.fedex.com. 1000 query pwn90150.corp.ds.fedex.com. IN A Jun 20 15:52:55 res2 unbound[14692]: [14692:1] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:56 res2 syslogd: last message repeated 1 times Jun 20 15:52:56 res2 unbound[14692]: [14692:5] error: read (in tcp r): Connection reset by peer for 199.81.212.13 port 53 Jun 20 15:52:56 res2 unbound[14692]: [14692:11] error: read (in tcp r): Connection reset by peer for 199.81.212.13 port 53 Jun 20 15:52:57 res2 unbound[14692]: [14692:2] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:52:57 res2 unbound[14692]: [14692:13] error: read (in tcp r): Connection reset by peer for 204.135.12.100 port 53 Jun 20 15:52:59 res2 unbound[14692]: [14692:5] error: read (in tcp r): Connection reset by peer for 199.81.212.13 port 53 Jun 20 15:53:03 res2 unbound[14692]: [14692:9] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:53:03 res2 syslogd: last message repeated 3 times Jun 20 15:53:05 res2 unbound[14692]: [14692:a] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:53:06 res2 syslogd: last message repeated 1 times Jun 20 15:53:09 res2 unbound[14692]: [14692:7] error: read (in tcp s): Connection refused for 199.249.112.1 port 53 Jun 20 15:53:10 res2 unbound[14692]: [14692:7] error: read (in tcp s): Connection refused for 199.249.120.1 port 53 [...]

I.e. these are tons of oubound requests to upstreams (probably fake upstreams received from crafted DNS packets from spooffed clients or intruders) by TCP protocol. The code responding for this functionality is in outside_network.c source file.

iruzanov commented 3 years ago

Hello, Colleagues!

Last weekend i've detected that Unbound was crashed and there is interesting core dump: Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00000000004d116c in reuse_id_cmp (key1=0xec64aab00, key2=0x0) at ./services/outside_network.c:179

179 if(w1->id < w2->id) [Current thread is 1 (LWP 101705)] (gdb) bt

0 0x00000000004d116c in reuse_id_cmp (key1=0xec64aab00, key2=0x0) at ./services/outside_network.c:179

1 0x0000000000479045 in rbtree_find_less_equal (rbtree=0xd8b32bc38, key=0xec64aab00, result=0x7fffdeff5b08) at ./util/rbtree.c:530

2 0x00000000004787ed in rbtree_search (rbtree=0xd8b32bc38, key=0xec64aab00) at ./util/rbtree.c:285

3 0x0000000000478930 in rbtree_delete (rbtree=0xd8b32bc38, key=0xec64aab00) at ./util/rbtree.c:333

4 0x00000000004d2958 in reuse_tree_by_id_delete (reuse=0xd8b32bb60, w=0xec64aab00) at ./services/outside_network.c:836

5 0x00000000004d2c9a in reuse_move_writewait_away (outnet=0xcc6ec0000, pend=0xd8b32bb40) at ./services/outside_network.c:890

6 0x00000000004d33f0 in outnet_tcp_cb (c=0xd8b665200, arg=0xd8b32bb40, error=-1, reply_info=0x0) at ./services/outside_network.c:1096

7 0x00000000004c8220 in comm_point_tcp_handle_callback (fd=7258, event=6, arg=0xd8b665200) at ./util/netevent.c:2092

8 0x000000080086091a in event_persist_closure (base=0x801dfa000, ev=0xd8b7a6100) at event.c:1623

9 0x0000000800860b22 in event_process_active_single_queue (base=0x801dfa000, activeq=0x802c91000, max_to_process=2147483647, endtime=0x0) at event.c:1682

10 0x00000008008610b1 in event_process_active (base=0x801dfa000) at event.c:1783

11 0x00000008008617bf in event_base_loop (base=0x801dfa000, flags=0) at event.c:2006

12 0x0000000800861139 in event_base_dispatch (event_base=0x801dfa000) at event.c:1817

13 0x00000000004da603 in ub_event_base_dispatch (base=0x801dfa000) at ./util/ub_event.c:280

14 0x00000000004c4643 in comm_base_dispatch (b=0x801dc6000) at ./util/netevent.c:256

15 0x000000000041f070 in worker_work (worker=0x801c1f800) at ./daemon/worker.c:1951

16 0x000000000040b0c9 in thread_start (arg=0x801c1f800) at ./daemon/daemon.c:540

17 0x0000000800d9dfac in ?? () from /lib/libthr.so.3

18 0x0000000000000000 in ?? ()

Backtrace stopped: Cannot access memory at address 0x7fffdeff6000

(gdb) fr 1

1 0x0000000000479045 in rbtree_find_less_equal (rbtree=0xd8b32bc38, key=0xec64aab00, result=0x7fffdeff5b08) at ./util/rbtree.c:530

530 r = rbtree->cmp(key, node->key); (gdb) print node $1 = {parent = 0xe8626cea8, left = 0xec64aab28, right = 0xee7b31428, key = 0x0, color = 1 '\001'} (gdb) print node->parent $2 = {parent = 0xeca76ada8, left = 0xf27fdcd28, right = 0xede50e5a8, key = 0xe8626ce80, color = 0 '\000'} (gdb) print node->left $6 = {parent = 0xede50e5a8, left = 0x73abc0 , right = 0x73abc0 , key = 0xec64aab00, color = 0 '\000'} (gdb) print node->right $7 = {parent = 0xede50e5a8, left = 0x73abc0 , right = 0x73abc0 , key = 0xee7b31400, color = 0 '\000'}

As we can see there are nodes colored with BLACK with not NULL keys and one node colored with RED with NULL key.

iruzanov commented 3 years ago

And still one thing - our traffic includes tons of queries on which Unbound reply with ServFail status.

gthess commented 3 years ago

Do you mean that you see an increase of SERVFAIL responses with the affected versions or that your usual traffic includes a lot of SERVFAIL responses?

iruzanov commented 3 years ago

The second - usual traffic includes a lot of SERVFAIL responses.

iruzanov commented 3 years ago

Hello, colleagues! I've done still more research about the nodes with NULL-pointer keys. And for these purposes i wrote small function rbtree_traverse() to trap the code points where such nodes are already present: void rbtree_traverse(rbnode_type* node) { if(!node || node == RBTREE_NULL) return; if(node->key == NULL) abort(); rbtree_traverse(node->left); rbtree_traverse(node->right); }

I call this function from reuse_tcp_find() lines 524 and 525 (while() loop) of services/outside_network.c: rbtree_traverse(result); rbtree_traverse(((struct reuse_tcp*)result)->tree_by_id.root);

And this might be interesting that program crashes on the second call, i.e. when we traverse tcp reuse tree with casting by (struct reuse_tcp): (gdb) print ((struct reuse_tcp*)result)->tree_by_id.root $5 = {parent = 0x73afe0 , left = 0xef9a10d68, right = 0xf23595f28, key = 0x0, color = 0 '\000'}

But for (rbnode_type)result everything ok: (gdb) print result $2 = {parent = 0xddb2f1d20, left = 0x73afe0 , right = 0x73afe0 , key = 0xddb2f0ce0, color = 0 '\000'}

I'm not sure that i dig the code properly but i hope this might be helpfull for you.

gthess commented 3 years ago

Hi Igor, right on time!

The end result of the issue is that there seem to be nodes in the tree with NULL keys but the issue is manifesting way before this part of the code is reached.

I have merged some recent fixes about this issue (PR #513) and it would be great if you could test and provide feedback!

iruzanov commented 3 years ago

Excellent news! :) And i will gladly test your fixes! I hope this will solve the problem. So i will start apply your 9 fixes right now. And next week i will report you about first results!

iruzanov commented 3 years ago

Hello Colleagues!

In previous week i have patched 6 servers. These are working FINE for now! ;) So my plan is to upgrade Unbound on all other servers to monitor my service during this week. And in next monday i will report you about results.

gthess commented 3 years ago

Hi Igor, these are some good news for now! Waiting for your results and then we are planning on moving on with a release. In the mean time, can you verify that the patched servers have considerable TCP traffic to upstreams, so that we know the different stream reuse codepaths are used? Thanks!

iruzanov commented 3 years ago

Yes, i will! But still i would like to ask you about the following logs on all patched servers: Aug 2 17:07:06 unbound[3701]: [3701:6] error: internal error: reuse_tcp_insert: in lru list without key Aug 2 17:07:06 unbound[3701]: [3701:0] error: internal error: reuse_tcp_insert: in lru list without key Aug 2 17:07:06 unbound[3701]: [3701:d] error: internal error: reuse_tcp_insert: in lru list without key Aug 2 17:07:06 unbound[3701]: [3701:c] error: internal error: reuse_tcp_insert: in lru list without key [...]

Is it the normal situation?

iruzanov commented 3 years ago

Hello, collegues! In previous monday and tuesday i have patched all of my Unbound servers. And they work without any crash dumps! Great patches! ;) What about considerable tcp traffic to upstreams - i can see the traffic on all of my patched servers too.

gthess commented 3 years ago

Hi Igor, great news! These patches will be part of the upcoming 1.13.2 version coming later this week.

The error message is a bit worrying since it reveals a situation that shouldn't happen normally. But the non-crashing part seems to indicate that this may happen during a callback and no extra harm at this point (like https://github.com/NLnetLabs/unbound/blob/8e538dcaa8df2d0fab8ff3dcf94ac1f972450b66/services/outside_network.c#L1264, or https://github.com/NLnetLabs/unbound/blob/8e538dcaa8df2d0fab8ff3dcf94ac1f972450b66/services/outside_network.c#L2590).

Unfortunately I haven't come across this in any of my tests to provide more useful information.

Before debugging further I would wait for you to test the full 1.13.2 release because as I understand, you test 1.13.1 with patches from the related commits currently. If these error messages keep appearing you can continue to debug further.

If you want to debug further you can configure unbound with --enable-debug. This will enable the various log_assert() we have in place during developing and they may shed more light in your case. It is usually not recommended to run like that in production since log_assert will just terminate unbound with a log message when a condition is not met.

This WILL NOT produce a core dump. If you prefer that, you can change https://github.com/NLnetLabs/unbound/blob/8e538dcaa8df2d0fab8ff3dcf94ac1f972450b66/util/log.h#L222-L226 to just a normal assert like a few lines above.

iruzanov commented 3 years ago

Thank you very much! I will update Unbound to 1.13.2 release first and then i will debug the program with your proposed asserts. I think this way more accurate. And i might even use abort() to produce core dumps at the program point that we are asserting ;)

Well, tomorrow i will be in vacation untill august 29. On monday august 30 i will return to start step 1: upgrade Unbound to 1.13.2. I hope this release will be available ;)

gthess commented 3 years ago

Actually it was released not more than 30 minutes ago :)

Thank you for your help thus far and enjoy your vacation!

iruzanov commented 2 years ago

Hello colleagues!

Its my pleasure to use Unbound 1.13.2. And the program is working fine but one more bug i've detected one month ago. Unbound is crashing via SIGSEGV. So i have recompiled the program with the following flags: "-g -O0 -fsanitize=address -fno-omit-frame-pointer". Below i put full stacktrace of the problem thread catched by libasan:

(gdb) thr 3 [Switching to thread 3 (Thread 0x7f1ae971e700 (LWP 3271756))]

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

50 return ret; (gdb) bt full

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

    set = {__val = {18446744067266838239, 139753671451101, 106790115319360, 72197347709311613, 139753671629156, 3271753, 206158430216, 139753562427008,
        139753562426816, 7012368136350825216, 139753676056768, 139753390149632, 139753562430192, 139753562427046, 139753678510688, 106858908091888}}
    pid = <optimized out>
    tid = <optimized out>
    ret = <optimized out>

1 0x00007f1aeec6bdb5 in __GI_abort () at abort.c:79

    save_stage = 1
    act = {__sigaction_handler = {sa_handler = 0x7f1aefef47e1 <__interceptor_free(void*)+209>,
        sa_sigaction = 0x7f1aefef47e1 <__interceptor_free(void*)+209>}, sa_mask = {__val = {4838403, 5068870, 5070683, 5101591, 5103337, 5063904,
          4431218, 6578138, 139753562428096, 106858908091888, 139753562426760, 8, 139753562428096, 139753562428096, 139753670522756, 139753562427504}},
      sa_flags = -1183483136, sa_restorer = 0x6130074205b0}
    sigs = {__val = {32, 0 <repeats 15 times>}}

2 0x00007f1aeff14c02 in __sanitizer::Abort () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc:145

No locals.

3 0x00007f1aeff1d52c in __sanitizer::Die () at ../../../../libsanitizer/sanitizer_common/sanitizer_termination.cc:57

No locals.

4 0x00007f1aefeffc55 in asan::ScopedInErrorReport::~ScopedInErrorReport (this=, in_chrg=)

at ../../../../libsanitizer/asan/asan_report.cc:181
    buffer_copy = <optimized out>

5 __asan::ReportGenericError (pc=, bp=bp@entry=139753562430240, sp=sp@entry=139753562430224, addr=106858908091888,

is_write=is_write@entry=false, access_size=access_size@entry=8, exp=0, fatal=true) at ../../../../libsanitizer/asan/asan_report.cc:397
    in_report = {error_report_lock_ = {<No data fields>}, static current_error_ = {kind = __asan::kErrorKindGeneric, {Base = {scariness = {score = 51,
              descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, DeadlySignal = {<__asan::ErrorBase> = {scariness = {score = 51,
                descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, signal = {siginfo = 0x2, context = 0x6130074205f0, addr = 3, pc = 3,
              sp = 5440349067550916617, bp = 106858908091888, is_memory_access = 48, write_flag = __sanitizer::SignalContext::UNKNOWN}},
          DoubleFree = {<__asan::ErrorBase> = {scariness = {score = 51,
                descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, second_free_stack = 0x2, addr_description = {addr = 106858908091888,
              alloc_tid = 3, free_tid = 3, alloc_stack_id = 427819017, free_stack_id = 1266679975, chunk_access = {bad_addr = 106858908091888,
                offset = 48, chunk_begin = 106858908091840, chunk_size = 355, access_type = 2, alloc_type = 1}}},
          NewDeleteSizeMismatch = {<__asan::ErrorBase> = {scariness = {score = 51,
                descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, free_stack = 0x2, addr_description = {addr = 106858908091888, alloc_tid = 3,
              free_tid = 3, alloc_stack_id = 427819017, free_stack_id = 1266679975, chunk_access = {bad_addr = 106858908091888, offset = 48,
                chunk_begin = 106858908091840, chunk_size = 355, access_type = 2, alloc_type = 1}}, delete_size = 7012368136350825216},
          FreeNotMalloced = {<__asan::ErrorBase> = {scariness = {score = 51,
                descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, free_stack = 0x2, addr_description = {data = {kind = 121767408, {shadow = {
                    addr = 3, kind = (__asan::kShadowKindGap | __asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = {addr = 3, alloc_tid = 3,
                    free_tid = 5440349067550916617, alloc_stack_id = 121767408, free_stack_id = 24880, chunk_access = {bad_addr = 48,
                      offset = 106858908091840, chunk_begin = 355, chunk_size = 5432374, access_type = 0, alloc_type = 0}}, stack = {addr = 3, tid = 3,

--Type for more, q to quit, c to continue without paging-- offset = 5440349067550916617, frame_pc = 106858908091888, access_size = 48, frame_descr = 0x6130074205c0 "\247"}, global = { addr = 3, static kMaxGlobals = 4, globals = {{beg = 3, size = 5440349067550916617, size_with_redzone = 106858908091888, name = 0x30 <error: Cannot access memory at address 0x30>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 355, location = 0x52e436 <fatal_exit+276>, odr_indicator = 7012368136350825216}, {beg = 206158430224, size = 139753562428464, size_with_redzone = 17469195303558, name = 0x7f1ae971d490 "h9\306\360\032\177", module_name = 0x7f1ae971d430 "te-read-heap-use-after-free", has_dynamic_init = 106378361083840, location = 0x2, odr_indicator = 9223372036854776046}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x41b58ab3, odr_indicator = 139753562428400}, {beg = 106858908091840, size = 106790081373240, size_with_redzone = 206158430224, name = 0x942920 " )\224", module_name = 0x7f1ae971d450 "\210\317q\351\032\177", has_dynamic_init = 5471136, location = 0x6130074205c0, odr_indicator = 106790081373240}}, reg_sites = {1102416563, 0, 7101984, 0}, access_size = 5470962, size = 26 '\032'}, addr = 3}}}}, AllocTypeMismatch = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, dealloc_stack = 0x2, addr_description = {addr = 106858908091888, alloc_tid = 3, free_tid = 3, alloc_stack_id = 427819017, free_stack_id = 1266679975, chunk_access = {bad_addr = 106858908091888, offset = 48, chunk_begin = 106858908091840, chunk_size = 355, access_type = 2, alloc_type = 1}}, alloc_type = 3111484160, dealloc_type = 1632694186}, MallocUsableSizeNotOwned = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, stack = 0x2, addr_description = {data = {kind = 121767408, {shadow = { addr = 3, kind = (asan::kShadowKindGap | asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = {addr = 3, alloc_tid = 3, free_tid = 5440349067550916617, alloc_stack_id = 121767408, free_stack_id = 24880, chunk_access = {bad_addr = 48, offset = 106858908091840, chunk_begin = 355, chunk_size = 5432374, access_type = 0, alloc_type = 0}}, stack = {addr = 3, tid = 3, offset = 5440349067550916617, frame_pc = 106858908091888, access_size = 48, frame_descr = 0x6130074205c0 "\247"}, global = { addr = 3, static kMaxGlobals = 4, globals = {{beg = 3, size = 5440349067550916617, size_with_redzone = 106858908091888, name = 0x30 <error: Cannot access memory at address 0x30>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 355, location = 0x52e436 <fatal_exit+276>, odr_indicator = 7012368136350825216}, {beg = 206158430224, size = 139753562428464, size_with_redzone = 17469195303558, name = 0x7f1ae971d490 "h9\306\360\032\177", module_name = 0x7f1ae971d430 "te-read-heap-use-after-free", has_dynamic_init = 106378361083840, location = 0x2, odr_indicator = 9223372036854776046}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x41b58ab3, odr_indicator = 139753562428400}, {beg = 106858908091840, size = 106790081373240, size_with_redzone = 206158430224, name = 0x942920 " )\224", module_name = 0x7f1ae971d450 "\210\317q\351\032\177", has_dynamic_init = 5471136, location = 0x6130074205c0, odr_indicator = 106790081373240}}, reg_sites = {1102416563, 0, 7101984, 0}, access_size = 5470962, size = 26 '\032'}, addr = 3}}}}, SanitizerGetAllocatedSizeNotOwned = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, stack = 0x2, addr_description = {data = {kind = 121767408, {shadow = { addr = 3, kind = (asan::kShadowKindGap | asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = {addr = 3, alloc_tid = 3, free_tid = 5440349067550916617, alloc_stack_id = 121767408, free_stack_id = 24880, chunk_access = {bad_addr = 48, offset = 106858908091840, chunk_begin = 355, chunk_size = 5432374, access_type = 0, alloc_type = 0}}, stack = {addr = 3, tid = 3, offset = 5440349067550916617, frame_pc = 106858908091888, access_size = 48, frame_descr = 0x6130074205c0 "\247"}, global = { addr = 3, static kMaxGlobals = 4, globals = {{beg = 3, size = 5440349067550916617, size_with_redzone = 106858908091888, name = 0x30 <error: Cannot access memory at address 0x30>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 355, location = 0x52e436 <fatal_exit+276>, odr_indicator = 7012368136350825216}, {beg = 206158430224, size = 139753562428464, size_with_redzone = 17469195303558, name = 0x7f1ae971d490 "h9\306\360\032\177", module_name = 0x7f1ae971d430 "te-read-heap-use-after-free", has_dynamic_init = 106378361083840, location = 0x2, odr_indicator = 9223372036854776046}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x41b58ab3, odr_indicator = 139753562428400}, {beg = 106858908091840, size = 106790081373240, size_with_redzone = 206158430224, name = 0x942920 " )\224", module_name = 0x7f1ae971d450 "\210\317q\351\032\177", has_dynamic_init = 5471136, location = 0x6130074205c0, odr_indicator = 106790081373240}}, reg_sites = {1102416563, 0, 7101984, 0}, access_size = 5470962, size = 26 '\032'}, --Type for more, q to quit, c to continue without paging-- addr = 3}}}}, StringFunctionMemoryRangesOverlap = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, stack = 0x2, length1 = 106858908091888, length2 = 3, addr1_description = { data = {kind = asan::kAddressKindStack, {shadow = {addr = 5440349067550916617, kind = (unknown: -16), shadow_byte = 5 '\005'}, heap = { addr = 5440349067550916617, alloc_tid = 106858908091888, free_tid = 48, alloc_stack_id = 121767360, free_stack_id = 24880, chunk_access = {bad_addr = 355, offset = 5432374, chunk_begin = 7012368136350825216, chunk_size = 206158430224, access_type = 0, alloc_type = 0}}, stack = {addr = 5440349067550916617, tid = 106858908091888, offset = 48, frame_pc = 106858908091840, access_size = 355, frame_descr = 0x52e436 <fatal_exit+276> ""}, global = {addr = 5440349067550916617, static kMaxGlobals = 4, globals = {{beg = 106858908091888, size = 48, size_with_redzone = 106858908091840, name = 0x163 <error: Cannot access memory at address 0x163>, module_name = 0x52e436 <fatal_exit+276> "", has_dynamic_init = 7012368136350825216, location = 0x3000000010, odr_indicator = 139753562428464}, {beg = 17469195303558, size = 139753562428560, size_with_redzone = 139753562428464, name = 0x60c0246c97c0 "", module_name = 0x2 <error: Cannot access memory at address 0x2>, has_dynamic_init = 9223372036854776046, location = 0x0, odr_indicator = 0}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x41b58ab3 <error: Cannot access memory at address 0x41b58ab3>, has_dynamic_init = 139753562428400, location = 0x6130074205c0, odr_indicator = 106790081373240}, {beg = 206158430224, size = 9709856, size_with_redzone = 139753562428496, name = 0x537ba0 <rbtree_search+174> "\205\300t'I\215E\300H\211\302H\301\352\003H\201", <incomplete sequence \302>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 106790081373240, location = 0x41b58ab3, odr_indicator = 7101984}}, reg_sites = {5470962, 0, 5477914, 0}, access_size = 139753562430240, size = 16 '\020'}, addr = 5440349067550916617}}}, addr2_description = {data = {kind = 8, {shadow = {addr = 139753671585219, kind = asan::kShadowKindLow, shadow_byte = 253 '\375'}, heap = { addr = 139753671585219, alloc_tid = 139753562438912, free_tid = 106583908513840, alloc_stack_id = 279502912, free_stack_id = 24864, chunk_access = {bad_addr = 106858968657856, offset = 106790081372992, chunk_begin = 1102416563, chunk_size = 7264512, access_type = 2, alloc_type = 2}}, stack = {addr = 139753671585219, tid = 139753562438912, offset = 106583908513840, frame_pc = 106790346350656, access_size = 106858968657856, frame_descr = 0x612000dda340 "\300\236", <incomplete sequence \335>}, global = {addr = 139753671585219, static kMaxGlobals = 4, globals = {{beg = 139753562438912, size = 106583908513840, size_with_redzone = 106790346350656, name = 0x61300ade2fc0 "@\243", <incomplete sequence \335>, module_name = 0x612000dda340 "\300\236", <incomplete sequence \335>, has_dynamic_init = 1102416563, location = 0x6ed900, odr_indicator = 6578362}, {beg = 139753562430272, size = 139753562429952, size_with_redzone = 106378361083840, name = 0x7f1ae971db60 "\200\333q\351\376\377\377\377\320\333q\351\032\177", module_name = 0x63872f <reuse_tcp_remove_tree_list+690> "H\213\205\220\376\377\377H\005\300", has_dynamic_init = 106790081373024, location = 0x613000030080, odr_indicator = 7242016}, {beg = 17187126944, size = 0, size_with_redzone = 18033364716583124994, name = 0x0, module_name = 0x5e971da60 <error: Cannot access memory at address 0x5e971da60>, has_dynamic_init = 206158430224, location = 0x7f1ae971db50, odr_indicator = 139753562430064}, {beg = 139753663233543, size = 21474836496, size_with_redzone = 8770, name = 0x7f1ae971da90 "\300", module_name = 0x60c003526480 "", has_dynamic_init = 105690610497296, location = 0x0, odr_indicator = 8770}}, reg_sites = {55277312, 24608, 192, 0}, access_size = 16, size = 64 '@'}, addr = 139753671585219}}}, function = 0x7f1aef739640 <evmap_iodel+224> "\272\001"}, StringFunctionSizeOverflow = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, stack = 0x2, addr_description = {data = {kind = 121767408, {shadow = { addr = 3, kind = (asan::kShadowKindGap | asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = {addr = 3, alloc_tid = 3, free_tid = 5440349067550916617, alloc_stack_id = 121767408, free_stack_id = 24880, chunk_access = {bad_addr = 48, offset = 106858908091840, chunk_begin = 355, chunk_size = 5432374, access_type = 0, alloc_type = 0}}, stack = {addr = 3, tid = 3, offset = 5440349067550916617, frame_pc = 106858908091888, access_size = 48, frame_descr = 0x6130074205c0 "\247"}, global = { addr = 3, static kMaxGlobals = 4, globals = {{beg = 3, size = 5440349067550916617, size_with_redzone = 106858908091888, name = 0x30 <error: Cannot access memory at address 0x30>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 355, location = 0x52e436 <fatal_exit+276>, odr_indicator = 7012368136350825216}, {beg = 206158430224, size = 139753562428464, size_with_redzone = 17469195303558, name = 0x7f1ae971d490 "h9\306\360\032\177", module_name = 0x7f1ae971d430 "te-read-heap-use-after-free", has_dynamic_init = 106378361083840, location = 0x2, odr_indicator = 9223372036854776046}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x41b58ab3, odr_indicator = 139753562428400}, {beg = 106858908091840, size = 106790081373240, size_with_redzone = 206158430224, name = 0x942920 " )\224", --Type for more, q to quit, c to continue without paging-- module_name = 0x7f1ae971d450 "\210\317q\351\032\177", has_dynamic_init = 5471136, location = 0x6130074205c0, odr_indicator = 106790081373240}}, reg_sites = {1102416563, 0, 7101984, 0}, access_size = 5470962, size = 26 '\032'}, addr = 3}}}, size = 139753562430240}, BadParamsToAnnotateContiguousContainer = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, stack = 0x2, beg = 106858908091888, end = 3, old_mid = 3, new_mid = 5440349067550916617}, ODRViolation = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, global1 = {beg = 2, size = 106858908091888, size_with_redzone = 3, name = 0x3 <error: Cannot access memory at address 0x3>, module_name = 0x4b8000a719800009 <error: Cannot access memory at address 0x4b8000a719800009>, has_dynamic_init = 106858908091888, location = 0x30, odr_indicator = 106858908091840}, global2 = {beg = 355, size = 5432374, size_with_redzone = 7012368136350825216, name = 0x3000000010 <error: Cannot access memory at address 0x3000000010>, module_name = 0x7f1ae971d430 "te-read-heap-use-after-free", has_dynamic_init = 17469195303558, location = 0x7f1ae971d490, odr_indicator = 139753562428464}, stack_id1 = 611096512, stack_id2 = 24768}, InvalidPointerPair = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, pc = 2, bp = 106858908091888, sp = 3, addr1_description = {data = { kind = asan::kAddressKindStack, {shadow = {addr = 5440349067550916617, kind = (unknown: -16), shadow_byte = 5 '\005'}, heap = { addr = 5440349067550916617, alloc_tid = 106858908091888, free_tid = 48, alloc_stack_id = 121767360, free_stack_id = 24880, chunk_access = {bad_addr = 355, offset = 5432374, chunk_begin = 7012368136350825216, chunk_size = 206158430224, access_type = 0, alloc_type = 0}}, stack = {addr = 5440349067550916617, tid = 106858908091888, offset = 48, frame_pc = 106858908091840, access_size = 355, frame_descr = 0x52e436 <fatal_exit+276> ""}, global = {addr = 5440349067550916617, static kMaxGlobals = 4, globals = {{beg = 106858908091888, size = 48, size_with_redzone = 106858908091840, name = 0x163 <error: Cannot access memory at address 0x163>, module_name = 0x52e436 <fatal_exit+276> "", has_dynamic_init = 7012368136350825216, location = 0x3000000010, odr_indicator = 139753562428464}, {beg = 17469195303558, size = 139753562428560, size_with_redzone = 139753562428464, name = 0x60c0246c97c0 "", module_name = 0x2 <error: Cannot access memory at address 0x2>, has_dynamic_init = 9223372036854776046, location = 0x0, odr_indicator = 0}, {beg = 0, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x41b58ab3 <error: Cannot access memory at address 0x41b58ab3>, has_dynamic_init = 139753562428400, location = 0x6130074205c0, odr_indicator = 106790081373240}, {beg = 206158430224, size = 9709856, size_with_redzone = 139753562428496, name = 0x537ba0 <rbtree_search+174> "\205\300t'I\215E\300H\211\302H\301\352\003H\201", <incomplete sequence \302>, module_name = 0x6130074205c0 "\247", has_dynamic_init = 106790081373240, location = 0x41b58ab3, odr_indicator = 7101984}}, reg_sites = {5470962, 0, 5477914, 0}, access_size = 139753562430240, size = 16 '\020'}, addr = 5440349067550916617}}}, addr2_description = {data = {kind = 8, {shadow = {addr = 139753671585219, kind = asan::kShadowKindLow, shadow_byte = 253 '\375'}, heap = { addr = 139753671585219, alloc_tid = 139753562438912, free_tid = 106583908513840, alloc_stack_id = 279502912, free_stack_id = 24864, chunk_access = {bad_addr = 106858968657856, offset = 106790081372992, chunk_begin = 1102416563, chunk_size = 7264512, access_type = 2, alloc_type = 2}}, stack = {addr = 139753671585219, tid = 139753562438912, offset = 106583908513840, frame_pc = 106790346350656, access_size = 106858968657856, frame_descr = 0x612000dda340 "\300\236", <incomplete sequence \335>}, global = {addr = 139753671585219, static kMaxGlobals = 4, globals = {{beg = 139753562438912, size = 106583908513840, size_with_redzone = 106790346350656, name = 0x61300ade2fc0 "@\243", <incomplete sequence \335>, module_name = 0x612000dda340 "\300\236", <incomplete sequence \335>, has_dynamic_init = 1102416563, location = 0x6ed900, odr_indicator = 6578362}, {beg = 139753562430272, size = 139753562429952, size_with_redzone = 106378361083840, name = 0x7f1ae971db60 "\200\333q\351\376\377\377\377\320\333q\351\032\177", module_name = 0x63872f <reuse_tcp_remove_tree_list+690> "H\213\205\220\376\377\377H\005\300", has_dynamic_init = 106790081373024, location = 0x613000030080, odr_indicator = 7242016}, {beg = 17187126944, size = 0, size_with_redzone = 18033364716583124994, name = 0x0, module_name = 0x5e971da60 <error: Cannot access memory at address 0x5e971da60>, has_dynamic_init = 206158430224, location = 0x7f1ae971db50, odr_indicator = 139753562430064}, {beg = 139753663233543, size = 21474836496, size_with_redzone = 8770, name = 0x7f1ae971da90 "\300", module_name = 0x60c003526480 "", has_dynamic_init = 105690610497296, location = 0x0, odr_indicator = 8770}}, reg_sites = {55277312, 24608, 192, 0}, access_size = 16, size = 64 '@'}, --Type for more, q to quit, c to continue without paging-- addr = 139753671585219}}}}, Generic = {<__asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, addr_description = {data = {kind = asan::kAddressKindHeap, {shadow = { addr = 106858908091888, kind = (asan::kShadowKindGap | asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = { addr = 106858908091888, alloc_tid = 3, free_tid = 3, alloc_stack_id = 427819017, free_stack_id = 1266679975, chunk_access = { bad_addr = 106858908091888, offset = 48, chunk_begin = 106858908091840, chunk_size = 355, access_type = 2, alloc_type = 1}}, stack = {addr = 106858908091888, tid = 3, offset = 3, frame_pc = 5440349067550916617, access_size = 106858908091888, frame_descr = 0x30 <error: Cannot access memory at address 0x30>}, global = {addr = 106858908091888, static kMaxGlobals = 4, globals = {{beg = 3, size = 3, size_with_redzone = 5440349067550916617, name = 0x6130074205f0 " )\224", module_name = 0x30 <error: Cannot access memory at address 0x30>, has_dynamic_init = 106858908091840, location = 0x163, odr_indicator = 5432374}, {beg = 7012368136350825216, size = 206158430224, size_with_redzone = 139753562428464, name = 0xfe35d2e3a86 <error: Cannot access memory at address 0xfe35d2e3a86>, module_name = 0x7f1ae971d490 "h9\306\360\032\177", has_dynamic_init = 139753562428464, location = 0x60c0246c97c0, odr_indicator = 2}, {beg = 9223372036854776046, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x0, odr_indicator = 1102416563}, { beg = 139753562428400, size = 106858908091840, size_with_redzone = 106790081373240, name = 0x3000000010 <error: Cannot access memory at address 0x3000000010>, module_name = 0x942920 " )\224", has_dynamic_init = 139753562428496, location = 0x537ba0 <rbtree_search+174>, odr_indicator = 106858908091840}}, reg_sites = { 14525496, 24864, 1102416563, 0}, access_size = 7101984, size = 242 '\362'}, addr = 106858908091888}}}, pc = 5477914, bp = 139753562430240, sp = 139753562430224, access_size = 8, bug_descr = 0x7f1aeff36dc3 "heap-use-after-free", is_write = false, shadow_val = 253 '\375'}}}, halt_onerror = true} error = {<asan::ErrorBase> = {scariness = {score = 51, descr = "8-byte-read-heap-use-after-free\000\032\177\000\000\210\317q\351\032\177\000\000\001\000\000\000\000\000\000\000\016\325\335\360\032\177\000\000\230\002\000\000\000\000\000\000P.\341\357\032\177\000\000\320\321\377\360\032\177\000\000\210\317q\351\032\177\000\000\204\317q\351\032\177\000\000h9\306\360\032\177\000\000 \320q\351\032\177\000\000\000\000\000\000\000\000\000\000\360\205\340\357\032\177\000\000\063!@\000\000\000\000\000\070\333\347\303\000\000\000\000H\030@\000\000\000\000\000P.\341\357\032\177\000\000l\237\017\003\000\000\000\000P\320q\351\032\177\000\000@\320q\351\032\177\000\000\200H\377\360\032\177\000\000\017\000\000\000\000\000\000\000"...}, tid = 3}, addr_description = {data = {kind = asan::kAddressKindHeap, {shadow = { addr = 106858908091888, kind = (asan::kShadowKindGap | __asan::kShadowKindHigh), shadow_byte = 0 '\000'}, heap = { addr = 106858908091888, alloc_tid = 3, free_tid = 3, alloc_stack_id = 427819017, free_stack_id = 1266679975, chunk_access = { bad_addr = 106858908091888, offset = 48, chunk_begin = 106858908091840, chunk_size = 355, access_type = 2, alloc_type = 1}}, stack = { addr = 106858908091888, tid = 3, offset = 3, frame_pc = 5440349067550916617, access_size = 106858908091888, frame_descr = 0x30 <error: Cannot access memory at address 0x30>}, global = {addr = 106858908091888, static kMaxGlobals = 4, globals = {{ beg = 3, size = 3, size_with_redzone = 5440349067550916617, name = 0x6130074205f0 " )\224", module_name = 0x30 <error: Cannot access memory at address 0x30>, has_dynamic_init = 106858908091840, location = 0x163, odr_indicator = 5432374}, {beg = 7012368136350825216, size = 206158430224, size_with_redzone = 139753562428464, name = 0xfe35d2e3a86 <error: Cannot access memory at address 0xfe35d2e3a86>, module_name = 0x7f1ae971d490 "h9\306\360\032\177", has_dynamic_init = 139753562428464, location = 0x60c0246c97c0, odr_indicator = 2}, {beg = 9223372036854776046, size = 0, size_with_redzone = 0, name = 0x0, module_name = 0x0, has_dynamic_init = 0, location = 0x0, odr_indicator = 1102416563}, { beg = 139753562428400, size = 106858908091840, size_with_redzone = 106790081373240, name = 0x3000000010 <error: Cannot access memory at address 0x3000000010>, module_name = 0x942920 " )\224", has_dynamic_init = 139753562428496, location = 0x537ba0 <rbtree_search+174>, odr_indicator = 106858908091840}}, reg_sites = { 14525496, 24864, 1102416563, 0}, access_size = 7101984, size = 242 '\362'}, addr = 106858908091888}}}, pc = 5477914, bp = 139753562430240, sp = 139753562430224, access_size = 8, bug_descr = 0x7f1aeff36dc3 "heap-use-after-free", is_write = false, shadow_val = 253 '\375'}

6 0x00007f1aeff0075b in asan::asan_report_load8 (addr=) at ../../../../libsanitizer/asan/asan_rtl.cc:112

    bp = 139753562430240
    pc = <optimized out>
    local_stack = 139753562430384
    sp = 139753562430224

7 0x000000000053961a in rbtree_next (node=0x6130074205e8) at util/rbtree.c:577

    parent = 0x61300ade2fc0

8 0x00000000006390ac in reuse_cb_readwait_for_failure (tree_by_id=0x7f1ae971dbd0, err=-2) at services/outside_network.c:1135

    w = 0x61300ade2fc0
    node = 0x61300ade2fe8

9 0x0000000000639355 in reuse_cb_and_decommission (outnet=0x613000030080, pend=0x612000dda340, error=-2) at services/outside_network.c:1154

    store = {root = 0x61300ade2fe8, count = 2, cmp = 0x632871 <reuse_id_cmp>}

10 0x0000000000640e47 in outnet_tcptimer (arg=0x61300ade2fc0) at services/outside_network.c:2197

--Type for more, q to quit, c to continue without paging-- pend = 0x612000dda340 w = 0x61300ade2fc0 outnet = 0x613000030080

11 0x0000000000625f3e in comm_timer_callback (fd=-1, event=1, arg=0x6040089ac910) at util/netevent.c:4280

    tm = 0x6040089ac910
    __func__ = "comm_timer_callback"

12 0x00007f1aef736be2 in event_process_active_single_queue (base=base@entry=0x61700002ff00, activeq=0x602000011010,

max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1691
    evcb_callback = <optimized out>
    res = 1
    ev = 0x60c0246c97c0
    evcb = <optimized out>
    count = 2
    __func__ = "event_process_active_single_queue"

13 0x00007f1aef7373e7 in event_process_active (base=0x61700002ff00) at event.c:1783

    activeq = <optimized out>
    i = 0
    c = 0
    tv = {tv_sec = 8936595, tv_usec = 347428}
    maxcb = 2147483647
    endtime = 0x0
    limit_after_prio = 2147483647
    activeq = <optimized out>
    i = <optimized out>
    c = <optimized out>
    endtime = <optimized out>
    tv = <optimized out>
    maxcb = <optimized out>
    limit_after_prio = <optimized out>

14 event_base_loop (base=0x61700002ff00, flags=) at event.c:2006

    n = <optimized out>
    evsel = 0x7f1aef965b20 <epollops>
    tv = {tv_sec = 0, tv_usec = 880}
    tv_p = <optimized out>
    res = <optimized out>
    done = 0
    retval = 0
    __func__ = "event_base_loop"

15 0x000000000064ce5a in ub_event_base_dispatch (base=0x61700002ff00) at util/ub_event.c:280

No locals.

16 0x000000000060dc9f in comm_base_dispatch (b=0x60300004a830) at util/netevent.c:256

    retval = 32767

17 0x000000000044a185 in worker_work (worker=0x62300000e100) at daemon/worker.c:1940

No locals.

18 0x000000000041283d in thread_start (arg=0x62300000e100) at daemon/daemon.c:541

    worker = 0x62300000e100
    port_num = 3

19 0x00007f1aef01714a in start_thread (arg=) at pthread_create.c:479

    ret = <optimized out>
    pd = <optimized out>
    unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139753562433280, 5629422553894277874, 140735375915374, 140735375915375, 0, 139753562431296,
            -5752667792546409742, -5752671231268433166}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
          canceltype = 0}}}
    not_first_call = <optimized out>

20 0x00007f1aeed46dc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

No locals. (gdb)

Now, if we will dig the point of the thread where Unbound has terminated by sanitizer: (gdb) bt

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

1 0x00007f1aeec6bdb5 in __GI_abort () at abort.c:79

2 0x00007f1aeff14c02 in __sanitizer::Abort () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc:145

3 0x00007f1aeff1d52c in __sanitizer::Die () at ../../../../libsanitizer/sanitizer_common/sanitizer_termination.cc:57

4 0x00007f1aefeffc55 in asan::ScopedInErrorReport::~ScopedInErrorReport (this=, in_chrg=)

at ../../../../libsanitizer/asan/asan_report.cc:181

5 __asan::ReportGenericError (pc=, bp=bp@entry=139753562430240, sp=sp@entry=139753562430224, addr=106858908091888,

is_write=is_write@entry=false, access_size=access_size@entry=8, exp=0, fatal=true) at ../../../../libsanitizer/asan/asan_report.cc:397

6 0x00007f1aeff0075b in asan::asan_report_load8 (addr=) at ../../../../libsanitizer/asan/asan_rtl.cc:112

7 0x000000000053961a in rbtree_next (node=0x6130074205e8) at util/rbtree.c:577

8 0x00000000006390ac in reuse_cb_readwait_for_failure (tree_by_id=0x7f1ae971dbd0, err=-2) at services/outside_network.c:1135

9 0x0000000000639355 in reuse_cb_and_decommission (outnet=0x613000030080, pend=0x612000dda340, error=-2) at services/outside_network.c:1154

10 0x0000000000640e47 in outnet_tcptimer (arg=0x61300ade2fc0) at services/outside_network.c:2197

11 0x0000000000625f3e in comm_timer_callback (fd=-1, event=1, arg=0x6040089ac910) at util/netevent.c:4280

12 0x00007f1aef736be2 in event_process_active_single_queue (base=base@entry=0x61700002ff00, activeq=0x602000011010,

max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1691

13 0x00007f1aef7373e7 in event_process_active (base=0x61700002ff00) at event.c:1783

14 event_base_loop (base=0x61700002ff00, flags=) at event.c:2006

15 0x000000000064ce5a in ub_event_base_dispatch (base=0x61700002ff00) at util/ub_event.c:280

16 0x000000000060dc9f in comm_base_dispatch (b=0x60300004a830) at util/netevent.c:256

17 0x000000000044a185 in worker_work (worker=0x62300000e100) at daemon/worker.c:1940

18 0x000000000041283d in thread_start (arg=0x62300000e100) at daemon/daemon.c:541

19 0x00007f1aef01714a in start_thread (arg=) at pthread_create.c:479

20 0x00007f1aeed46dc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) fr 7

7 0x000000000053961a in rbtree_next (node=0x6130074205e8) at util/rbtree.c:577

577 for (node = node->right; node->left != RBTREE_NULL; node = node->left); (gdb) list 572 { 573 rbnode_type parent; 574 575 if (node->right != RBTREE_NULL) { 576 / One right, then keep on going left... */ 577 for (node = node->right; node->left != RBTREE_NULL; node = node->left); 578 } else { 579 parent = node->parent; 580 while (parent != RBTREE_NULL && node == parent->right) { 581 node = parent;

so we will see that the thread is crashing in for() (line 577 in util/rbtree.c) loop. As i can understand there free() was called somewhere before this loop where the operation node = node->right has provoked program crash. Could you please see the stack trace i put above? Maybe it will prompt the correct place in your code with root cause.

Big thanks in advance!

gthess commented 2 years ago

Hi @iruzanov! I am currently working on some race condition bugs (related to TCP code) and this one seems like one of them. I was planning to update you here when they were solved but good that you already reported. Can I count you in for testing? Also, it would be interesting to see how often you get those, so that we know what to expect for reproducibility in production.

iruzanov commented 2 years ago

Sure, i am totally at your service! ;) What about of the rate of the program crashes - at less loaded sites Unbound works well. It might be one core dump during a month. And at more loaded sites where a lot of TCP-traffic too, i can detect one or two crashes per a day.

iruzanov commented 2 years ago

Hello, @gthess!

A couple of days ago i have received email with subject "Unbound DoS vulnerability (only with debugging enabled!)" from your colleagues. In short - this is the warning about to use "--disable-debug --disable-checking" when compiling Unbound to avoid resolver termination when it receives crafted packet from upstream. It is the subject of versions 1.13.x and 1.14.0 of the program. So my question is - does it relates to race conditions in TCP-code that you have mentioned above? And are there some ready fixes to test by myself in the Unbound (1.13.2) running on my servers? ;)

gthess commented 2 years ago

Using '--disable-debug --disable-checking' explicitly turns off debugging (if you are not sure what is happening with the configure line in your environment). If you don't specify them at all, debugging is off by default. It does not relate to the race condition. It happened to be found during testing for the race condition bug. Apart from that I don't have complete fixes for the race condition yet.

iruzanov commented 2 years ago

Thanks for the detailed clarification! And now my Unbound resolvers are working without deep debug. But i am ready to recompile the program at any time that you will demend.

gthess commented 2 years ago

Hi @iruzanov, The above PR addresses the issues I was seeing with the race condition. It will be merged into master somewhere this week but if you would like to test already you can build from that branch in the meantime.

iruzanov commented 2 years ago

Hello @gthess!

I will gladly test your fixpack mantioned above, BIG thank you! And in next week i will report you about my first observations for Unbound on a couple of my loaded servers.

gthess commented 2 years ago

Just a heads up that the branch is now merged to master which also includes other (relevant) fixes.

iruzanov commented 2 years ago

ohh, ok! So i need to recompile my first patched Unbound again. Because i have fetched master branch with fixpacks from #612 yesterday :)

gthess commented 2 years ago

I would just go for the master branch as is since it contains other fixes. For this case, https://github.com/NLnetLabs/unbound/commit/8e76eb95a0d4fcb8bdfe4a6b5d84f279b71d8bc1 addresses a dnstap issue that could trigger something similar.

iruzanov commented 2 years ago

Thank you @gthess! I have compiled Unbound from master branch once again some hours ago. And my plan is to observe the server with the patched program next week and then i would like to upgrade some other of my loaded servers. I will report you about my first results next week!

iruzanov commented 2 years ago

Hello @gthess!

My first resolver with new code from master branch works fine during last week. And today i have upgraded next two of my loaded resolvers with the latest code. I will monitor these servers during 5-7 days and then i will report you about status of upgrading procedure.

iruzanov commented 2 years ago

Hi, @gthess! I gladly report you about stable work of my three patched very loaded resolvers for the last 7 days. So i've made decision to patch next three ones on next Monday and monitor them up to friday.

gthess commented 2 years ago

That's always good to hear, thanks for always reporting back!