NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.06k stars 349 forks source link

unbound 1.20.0 segmentation fault with nghttp2 #1103

Closed dukeartem closed 2 months ago

dukeartem commented 2 months ago

Hello, After upgrade unbound from 1.19.3 to 1.20.0 we have SIGSEGV in randomly moment, but all gathered traces about one place. This line has struct cp and it's ok, but n->query_reply.c has invalid memory address. Details: (disclaimer: all lines near function don't equal upstream code, because we have a few internal patch) gdb:

Program terminated with signal SIGSEGV, Segmentation fault.
[Current thread is 1 (Thread 0x7f1dd82cf640 (LWP 87))]
(gdb) bt
#0  mesh_state_remove_reply (mesh=0x5717bfc8d200, m=0x5717be438040, cp=0x57180b3b5a00)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/services/mesh.c:2092
#1  0x00000000008c8fb9 in http2_stream_delete (h2_session=0x57181cb27e70, h2_stream=0x571ab0c4fc60)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:3293
#2  http2_stream_close_cb (session=<optimized out>, stream_id=<optimized out>, error_code=<optimized out>, cb_arg=0x57181cb27e70)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:3363
#3  0x000000000086b8ef in nghttp2_session_close_stream (session=session@entry=0x571ab9221000, stream_id=5, error_code=8)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/nghttp2/lib/nghttp2_session.c:1496
#4  0x000000000086e3bc in nghttp2_session_on_rst_stream_received (session=0x571ab9221000, frame=0x571ab92212d8)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/nghttp2/lib/nghttp2_session.c:4554
#5  0x000000000087311f in session_process_rst_stream_frame (session=0x5717bfc8d200, session@entry=0x571ab9221000)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/nghttp2/lib/nghttp2_session.c:4569
#6  0x0000000000871e36 in nghttp2_session_mem_recv2 (session=session@entry=0x571ab9221000, 
    in=0x7f1dd82caa3d "ۂ͇\004\377\017bJ\220\267kKg\257\362J\221\004\060\303v\030a\273\f0\303\f0݄Yg$\333Q\033?\264]\304\031\307\313\305\342U\302\030p`\206\035\220Ûl0\303\f0\303\r\335\360\303\342\035\376!\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\206\030a\314\313\312\311\310\307\306\305\304\303\302\301\277\316~\317\352_Q\032H\031\373I\251,\001e\227\032i\340\270Ӯ>\027\032\177\356CLT&", 
    in@entry=0x7f1dd82caa30 "", inlen=13) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/nghttp2/lib/nghttp2_session.c:6548
#7  0x0000000000873375 in nghttp2_session_recv (session=0x571ab9221000) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/nghttp2/lib/nghttp2_session.c:7366
#8  0x00000000008c6c48 in comm_point_http2_handle_read (fd=3932, c=0x57180b3b5a00) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:3452
#9  comm_point_http_handle_read (fd=3932, c=0x57180b3b5a00) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:3503
#10 comm_point_http_handle_callback (fd=3932, event=2, arg=0x57180b3b5a00) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:3890
#11 0x000000000058f448 in event_persist_closure (base=0x5717bf7c0580, ev=<optimized out>) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/libevent/event.c:1623
#12 event_process_active_single_queue (base=0x5717bf7c0580, activeq=0x5717bfc0d300, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
    at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/libevent/event.c:1682
#13 0x000000000058beec in event_process_active (base=0x5717bf7c0580) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/libevent/event.c:1783
#14 event_base_loop (base=0x5717bf7c0580, flags=flags@entry=0) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/libevent/event.c:2006
#15 0x000000000058b8c7 in event_base_dispatch (event_base=0x5717bfc8d200) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/libs/libevent/event.c:1817
#16 0x00000000004ecec5 in ub_event_base_dispatch (base=0x5717bfc8d200) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/ub_event.c:280
#17 0x00000000008c537c in comm_base_dispatch (b=<optimized out>) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/util/netevent.c:282
#18 0x00000000004ebde9 in worker_work (worker=worker@entry=0x5717bf447800) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/daemon/worker.c:2357
#19 0x00000000004dc041 in thread_start (arg=0x5717bf447800) at /place/sandbox-data/tasks/1/5/2358437651/fake-svn-root/arcadia/contrib/tools/unbound/daemon/daemon.c:638
#20 0x00007f1dda473b43 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#21 0x00007f1dda505a00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
-----
(gdb) print *cp
$32 = {ev = 0x55baa0fcc3e0, event_added = 0, socket = 0x55b9ffdc2260, fd = 4800, timeout = 0x55baa0fcc3f0, buffer = 0x55baded71bc0, tcp_is_reading = 1, tcp_byte_count = 0, tcp_parent = 0x55ba9dd48d00, 
  repinfo = {c = 0x55bad71e8500, remote_addr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, remote_addrlen = 16, srctype = 0, client_nonce = '\000' <repeats 11 times>, 
    nmkey = '\000' <repeats 31 times>, dnsc_cert = 0x0, is_dnscrypted = 0, pktinfo = {v6info = {ipi6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, 
            __u6_addr32 = {0, 0, 0, 0}}}, ipi6_ifindex = 0}, v4info = {ipi_ifindex = 0, ipi_spec_dst = {s_addr = 0}, ipi_addr = {s_addr = 0}}}, max_udp_size = 4096, is_proxied = 0, client_addr = {
      ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, client_addrlen = 16}, max_tcp_count = 0, cur_tcp_count = 0, tcp_handlers = 0x0, tcp_free = 0x0, ssl = 0x0, 
  ssl_shake_state = comm_ssl_shake_none, http_min_version = http_version_2, http_endpoint = 0x55baa0fcc400 "/dns-query", http_in_headers = 0, http_in_chunk_headers = 0, http_is_chunked = 0, 
  http_temp = 0x0, http_stored = 0, h2_session = 0x55baded7a050, use_h2 = 1, h2_stream = 0x0, http2_stream_max_qbuffer_size = 512, http2_max_streams = 10, dtenv = 0x0, type = comm_http, pp2_enabled = 0, 
  pp2_header_state = pp2_header_none, do_not_close = 0, tcp_do_close = 1, tcp_write_and_read = 0, tcp_write_byte_count = 0, tcp_write_pkt = 0x0, tcp_write_pkt_len = 0, tcp_more_read_again = 0x0, 
  tcp_more_write_again = 0x0, tcp_do_toggle_rw = 0, tcp_timeout_msec = 30000, tcp_keepalive = 0, tcp_check_nb_connect = 0, tcp_conn_limit = 0x55b9ffdc2400, tcl_addr = 0x0, tcp_req_info = 0x0, 
  dnscrypt = 0, dnscrypt_buffer = 0x0, inuse = 0, recv_tv = {tv_sec = 0, tv_usec = 0}, callback = 0x4e81f0 <worker_handle_request>, cb_arg = 0x55b9ff443000}
----
(gdb) print m->reply_list
$34 = (struct mesh_reply *) 0xadae520a61780555
(gdb) print *m->reply_list
Cannot access memory at address 0xadae520a61780555

I rebuilt unbound with address sanitizer and gathered trace:

==700336==ERROR: AddressSanitizer: heap-use-after-free on address 0x62900bd064c0 at pc 0x000001532b07 bp 0x7f4d897a9cb0 sp 0x7f4d897a9c90
READ of size 8 at 0x62900bd064c0 thread T13
    #0 0x1532b06  (/unbound-1.20-sanitizer+0x1532b06) mesh_state_remove_reply unbound/services/mesh.c:2087 
    #1 0x164749f  (/unbound-1.20-sanitizer+0x164749f) http2_stream_delete unbound/util/netevent.c:3293 
    #2 0x1652e10  (/unbound-1.20-sanitizer+0x1652e10) http2_session_server_delete unbound/util/netevent.c:3318 
    #3 0x164176a  (/unbound-1.20-sanitizer+0x164176a) comm_point_close unbound/util/netevent.c:4684 
    #4 0x1648c08  (/unbound-1.20-sanitizer+0x1648c08) reclaim_http_handler unbound/util/netevent.c:2906 
    #5 0x164025d  (/unbound-1.20-sanitizer+0x164025d) comm_point_http_handle_callback unbound/util/netevent.c:3891 
    #6 0x9dc58f  (/unbound-1.20-sanitizer+0x9dc58f) event_persist_closure contrib/libs/libevent/event.c:1623 
    #7 0x9d9e1a  (/unbound-1.20-sanitizer+0x9d9e1a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 
    #8 0x9ca2b2  (/unbound-1.20-sanitizer+0x9ca2b2) event_process_active contrib/libs/libevent/event.c:1783 
    #9 0x9c6d9e  (/unbound-1.20-sanitizer+0x9c6d9e) event_base_loop contrib/libs/libevent/event.c:2006 
    #10 0x9c6176  (/unbound-1.20-sanitizer+0x9c6176) event_base_dispatch contrib/libs/libevent/event.c:1817 
    #11 0x811094  (/unbound-1.20-sanitizer+0x811094) ub_event_base_dispatch unbound/util/ub_event.c:280 
    #12 0x163623e  (/unbound-1.20-sanitizer+0x163623e) comm_base_dispatch unbound/util/netevent.c:282 
    #13 0x8091dd  (/unbound-1.20-sanitizer+0x8091dd) worker_work unbound/daemon/worker.c:2357 
    #14 0x7c079e  (/unbound-1.20-sanitizer+0x7c079e) thread_start unbound/daemon/daemon.c:638 
    #15 0x90b833  (/unbound-1.20-sanitizer+0x90b833) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 
    #16 0x8ca076  (/unbound-1.20-sanitizer+0x8ca076) _ZL17asan_thread_startPv /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:199 
    #17 0x7f4d980e0b42  (/lib/x86_64-linux-gnu/libc.so.6+0x94b42) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #18 0x7f4d981729ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)

0x62900bd064c0 is located 704 bytes inside of 16384-byte region [0x62900bd06200,0x62900bd0a200)
freed by thread T13 here:
    #0 0x8fc090  (/unbound-1.20-sanitizer+0x8fc090) free /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:52 
    #1 0x166cb1d  (/unbound-1.20-sanitizer+0x166cb1d) regional_destroy unbound/util/regional.c:141 
    #2 0x15c4e90  (/unbound-1.20-sanitizer+0x15c4e90) alloc_reg_release unbound/util/alloc.c:345 
    #3 0x15251c1  (/unbound-1.20-sanitizer+0x15251c1) mesh_state_cleanup unbound/services/mesh.c:990 
    #4 0x151cbc8  (/unbound-1.20-sanitizer+0x151cbc8) mesh_state_delete unbound/services/mesh.c:1028 
    #5 0x1531987  (/unbound-1.20-sanitizer+0x1531987) mesh_continue unbound/services/mesh.c:1922 
    #6 0x1522867  (/unbound-1.20-sanitizer+0x1522867) mesh_run unbound/services/mesh.c:1953 
    #7 0x15247f8  (/unbound-1.20-sanitizer+0x15247f8) mesh_report_reply unbound/services/mesh.c:856 
    #8 0x7f0ca7  (/unbound-1.20-sanitizer+0x7f0ca7) worker_handle_service_reply unbound/daemon/worker.c:269 
    #9 0x156217a  (/unbound-1.20-sanitizer+0x156217a) serviced_callbacks unbound/services/outside_network.c:3051 
    #10 0x1566ee8  (/unbound-1.20-sanitizer+0x1566ee8) serviced_udp_callback unbound/services/outside_network.c:3392 
    #11 0x1554c52  (/unbound-1.20-sanitizer+0x1554c52) outnet_udp_cb unbound/services/outside_network.c:1537 
    #12 0x163d645  (/unbound-1.20-sanitizer+0x163d645) comm_point_udp_callback unbound/util/netevent.c:1145 
    #13 0x9dc58f  (/unbound-1.20-sanitizer+0x9dc58f) event_persist_closure contrib/libs/libevent/event.c:1623 
    #14 0x9d9e1a  (/unbound-1.20-sanitizer+0x9d9e1a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 
    #15 0x9ca2b2  (/unbound-1.20-sanitizer+0x9ca2b2) event_process_active contrib/libs/libevent/event.c:1783 
    #16 0x9c6d9e  (/unbound-1.20-sanitizer+0x9c6d9e) event_base_loop contrib/libs/libevent/event.c:2006 
    #17 0x9c6176  (/unbound-1.20-sanitizer+0x9c6176) event_base_dispatch contrib/libs/libevent/event.c:1817 
    #18 0x811094  (/unbound-1.20-sanitizer+0x811094) ub_event_base_dispatch unbound/util/ub_event.c:280 
    #19 0x163623e  (/unbound-1.20-sanitizer+0x163623e) comm_base_dispatch unbound/util/netevent.c:282 
    #20 0x8091dd  (/unbound-1.20-sanitizer+0x8091dd) worker_work unbound/daemon/worker.c:2357 
    #21 0x7c079e  (/unbound-1.20-sanitizer+0x7c079e) thread_start unbound/daemon/daemon.c:638 
    #22 0x90b833  (/unbound-1.20-sanitizer+0x90b833) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 

previously allocated by thread T13 here:
    #0 0x8fc3c3  (/unbound-1.20-sanitizer+0x8fc3c3) malloc /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:69 
    #1 0x166c79c  (/unbound-1.20-sanitizer+0x166c79c) regional_create_custom_large_object unbound/util/regional.c:94 
    #2 0x166c75c  (/unbound-1.20-sanitizer+0x166c75c) regional_create_custom unbound/util/regional.c:108 
    #3 0x15c4df6  (/unbound-1.20-sanitizer+0x15c4df6) alloc_reg_obtain unbound/util/alloc.c:338 
    #4 0x15676f2  (/unbound-1.20-sanitizer+0x15676f2) outnet_serviced_query unbound/services/outside_network.c:3418 
    #5 0x808e96  (/unbound-1.20-sanitizer+0x808e96) worker_send_query unbound/daemon/worker.c:2417 
    #6 0x13d615d  (/unbound-1.20-sanitizer+0x13d615d) processQueryTargets unbound/iterator/iterator.c:3003 
    #7 0x13c42ea  (/unbound-1.20-sanitizer+0x13c42ea) iter_handle unbound/iterator/iterator.c:4148 
    #8 0x13c33b0  (/unbound-1.20-sanitizer+0x13c33b0) iter_operate unbound/iterator/iterator.c:4413 
    #9 0x15225c4  (/unbound-1.20-sanitizer+0x15225c4) mesh_run unbound/services/mesh.c:1943 
    #10 0x151ecce  (/unbound-1.20-sanitizer+0x151ecce) mesh_new_client unbound/services/mesh.c:559 
    #11 0x7fa779  (/unbound-1.20-sanitizer+0x7fa779) worker_handle_request unbound/daemon/worker.c:1946 
    #12 0x1493164  (/unbound-1.20-sanitizer+0x1493164) tcp_req_info_handle_readdone unbound/services/listen_dnsport.c:2241 
    #13 0x1656fb6  (/unbound-1.20-sanitizer+0x1656fb6) tcp_callback_reader unbound/util/netevent.c:1575 
    #14 0x1659406  (/unbound-1.20-sanitizer+0x1659406) ssl_handle_read unbound/util/netevent.c:2019 
    #15 0x1656d77  (/unbound-1.20-sanitizer+0x1656d77) ssl_handle_it unbound/util/netevent.c:2188 
    #16 0x16427bc  (/unbound-1.20-sanitizer+0x16427bc) comm_point_tcp_handle_read unbound/util/netevent.c:2206 
    #17 0x16448ec  (/unbound-1.20-sanitizer+0x16448ec) tcp_req_info_read_again unbound/util/netevent.c:2740 
    #18 0x164100e  (/unbound-1.20-sanitizer+0x164100e) comm_point_tcp_handle_callback unbound/util/netevent.c:2863 
    #19 0x9dc58f  (/unbound-1.20-sanitizer+0x9dc58f) event_persist_closure contrib/libs/libevent/event.c:1623 
    #20 0x9d9e1a  (/unbound-1.20-sanitizer+0x9d9e1a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 
    #21 0x9ca2b2  (/unbound-1.20-sanitizer+0x9ca2b2) event_process_active contrib/libs/libevent/event.c:1783 
    #22 0x9c6d9e  (/unbound-1.20-sanitizer+0x9c6d9e) event_base_loop contrib/libs/libevent/event.c:2006 
    #23 0x9c6176  (/unbound-1.20-sanitizer+0x9c6176) event_base_dispatch contrib/libs/libevent/event.c:1817 
    #24 0x811094  (/unbound-1.20-sanitizer+0x811094) ub_event_base_dispatch unbound/util/ub_event.c:280 
    #25 0x163623e  (/unbound-1.20-sanitizer+0x163623e) comm_base_dispatch unbound/util/netevent.c:282 
    #26 0x8091dd  (/unbound-1.20-sanitizer+0x8091dd) worker_work unbound/daemon/worker.c:2357 
    #27 0x7c079e  (/unbound-1.20-sanitizer+0x7c079e) thread_start unbound/daemon/daemon.c:638 
    #28 0x90b833  (/unbound-1.20-sanitizer+0x90b833) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 

Thread T13 created by T0 here:
    #0 0x8c9f44  (/unbound-1.20-sanitizer+0x8c9f44) pthread_create /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:208 
    #1 0x7bd81c  (/unbound-1.20-sanitizer+0x7bd81c) daemon_start_others unbound/daemon/daemon.c:654 
    #2 0x7bbb9e  (/unbound-1.20-sanitizer+0x7bbb9e) daemon_fork unbound/daemon/daemon.c:809 
    #3 0x7eb5b9  (/unbound-1.20-sanitizer+0x7eb5b9) run_daemon unbound/daemon/unbound.c:738 
    #4 0x7eaa26  (/unbound-1.20-sanitizer+0x7eaa26) main unbound/daemon/unbound.c:851 
    #5 0x7f4d98075d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)

SUMMARY: AddressSanitizer: heap-use-after-free (/unbound-1.20-sanitizer+0x1532b06) (BuildId: b6a712a2d6582d398a0eafb6b2ec52327de2c225) 
Shadow bytes around the buggy address:
  0x62900bd06200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06400: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x62900bd06480: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
  0x62900bd06500: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06680: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x62900bd06700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==700336==ABORTING

For clarity, that line 0 0x1532b06 (/unbound-1.20-sanitizer+0x1532b06) mesh_state_remove_reply unbound/services/mesh.c:2087 it's this place in upstream


I think it's some sort of race in libevent model and unbound with nghttp2. But i couldn't reproduce on test stand, only on user activity(after 1k rps) and this happens rarely at random moments, maybe one time per day.

System:

wcawijngaards commented 2 months ago

I cannot exactly reproduce the case that is presented, but I think this happens because the mesh state that it removes is still referencing a query that has already been deleted. To stop that, I added a call to remove the mesh state once the reply has been sent. That should then stop the code from calling the mesh_state_remove_reply routine, because the reply pointer has been set to NULL because the query has stopped. And then it should no longer crash.

wcawijngaards commented 2 months ago

Another commit that fixes the same issue, in case the http2 reply is dropped. Then it also cleans up the mesh state reference. So that in a postponed closed channel, the delete routine then does not call mesh_state_remove_reply, and thus also does not crash there.

dukeartem commented 2 months ago

Unfortunately, after patched and one day work, i have caught new trace

==826107==ERROR: AddressSanitizer: heap-use-after-free on address 0x629001ced4c0 at pc 0x000001532e27 bp 0x7ffbb47b0810 sp 0x7ffbb47b07f0
READ of size 8 at 0x629001ced4c0 thread T14
    #0 0x1532e26  (/unbound-1.20-sanitizer+0x1532e26) mesh_state_remove_reply unbound/services/mesh.c:2094 
    #1 0x164781f  (/unbound-1.20-sanitizer+0x164781f) http2_stream_delete unbound/util/netevent.c:3293 
    #2 0x164752d  (/unbound-1.20-sanitizer+0x164752d) http2_stream_close_cb unbound/util/netevent.c:3370 
    #3 0x14c4db3  (/unbound-1.20-sanitizer+0x14c4db3) nghttp2_session_close_stream contrib/libs/nghttp2/lib/nghttp2_session.c:1496 
    #4 0x14cbc6f  (/unbound-1.20-sanitizer+0x14cbc6f) nghttp2_session_on_rst_stream_received contrib/libs/nghttp2/lib/nghttp2_session.c:4554 
    #5 0x14dcd90  (/unbound-1.20-sanitizer+0x14dcd90) session_process_rst_stream_frame contrib/libs/nghttp2/lib/nghttp2_session.c:4569 
    #6 0x14d6f1a  (/unbound-1.20-sanitizer+0x14d6f1a) nghttp2_session_mem_recv2 contrib/libs/nghttp2/lib/nghttp2_session.c:6548 
    #7 0x14e03c7  (/unbound-1.20-sanitizer+0x14e03c7) nghttp2_session_recv contrib/libs/nghttp2/lib/nghttp2_session.c:7366 
    #8 0x165d1d7  (/unbound-1.20-sanitizer+0x165d1d7) comm_point_http2_handle_read unbound/util/netevent.c:3459 
    #9 0x164940c  (/unbound-1.20-sanitizer+0x164940c) comm_point_http_handle_read unbound/util/netevent.c:3510 
    #10 0x164056b  (/unbound-1.20-sanitizer+0x164056b) comm_point_http_handle_callback unbound/util/netevent.c:3897 
    #11 0x9dc58f  (/unbound-1.20-sanitizer+0x9dc58f) event_persist_closure contrib/libs/libevent/event.c:1623 
    #12 0x9d9e1a  (/unbound-1.20-sanitizer+0x9d9e1a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 
    #13 0x9ca2b2  (/unbound-1.20-sanitizer+0x9ca2b2) event_process_active contrib/libs/libevent/event.c:1783 
    #14 0x9c6d9e  (/unbound-1.20-sanitizer+0x9c6d9e) event_base_loop contrib/libs/libevent/event.c:2006 
    #15 0x9c6176  (/unbound-1.20-sanitizer+0x9c6176) event_base_dispatch contrib/libs/libevent/event.c:1817 
    #16 0x811094  (/unbound-1.20-sanitizer+0x811094) ub_event_base_dispatch unbound/util/ub_event.c:280 
    #17 0x163655e  (/unbound-1.20-sanitizer+0x163655e) comm_base_dispatch unbound/util/netevent.c:282 
    #18 0x8091dd  (/unbound-1.20-sanitizer+0x8091dd) worker_work unbound/daemon/worker.c:2357 
    #19 0x7c079e  (/unbound-1.20-sanitizer+0x7c079e) thread_start unbound/daemon/daemon.c:638 
    #20 0x90b833  (/unbound-1.20-sanitizer+0x90b833) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 
    #21 0x8ca076  (/unbound-1.20-sanitizer+0x8ca076) _ZL17asan_thread_startPv /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:199 
    #22 0x7ffbc2dc6b42  (/lib/x86_64-linux-gnu/libc.so.6+0x94b42) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
    #23 0x7ffbc2e589ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)

0x629001ced4c0 is located 704 bytes inside of 16384-byte region [0x629001ced200,0x629001cf1200)
freed by thread T14 here:
    #0 0x8fc090  (/unbound-1.20-sanitizer+0x8fc090) free /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:52 
    #1 0x166ce9d  (/unbound-1.20-sanitizer+0x166ce9d) regional_destroy unbound/util/regional.c:141 
    #2 0x15c51b0  (/unbound-1.20-sanitizer+0x15c51b0) alloc_reg_release unbound/util/alloc.c:345 
    #3 0x15252c4  (/unbound-1.20-sanitizer+0x15252c4) mesh_state_cleanup unbound/services/mesh.c:993 
    #4 0x151cbc8  (/unbound-1.20-sanitizer+0x151cbc8) mesh_state_delete unbound/services/mesh.c:1031 
    #5 0x1531ca7  (/unbound-1.20-sanitizer+0x1531ca7) mesh_continue unbound/services/mesh.c:1929 
    #6 0x1522867  (/unbound-1.20-sanitizer+0x1522867) mesh_run unbound/services/mesh.c:1960 
    #7 0x15247f8  (/unbound-1.20-sanitizer+0x15247f8) mesh_report_reply unbound/services/mesh.c:856 
    #8 0x7f0ca7  (/unbound-1.20-sanitizer+0x7f0ca7) worker_handle_service_reply unbound/daemon/worker.c:269 
    #9 0x156249a  (/unbound-1.20-sanitizer+0x156249a) serviced_callbacks unbound/services/outside_network.c:3051 
    #10 0x1567208  (/unbound-1.20-sanitizer+0x1567208) serviced_udp_callback unbound/services/outside_network.c:3392 
    #11 0x1554f72  (/unbound-1.20-sanitizer+0x1554f72) outnet_udp_cb unbound/services/outside_network.c:1537 
    #12 0x163d965  (/unbound-1.20-sanitizer+0x163d965) comm_point_udp_callback unbound/util/netevent.c:1145 
    #13 0x9dc58f  (/unbound-1.20-sanitizer+0x9dc58f) event_persist_closure contrib/libs/libevent/event.c:1623 
    #14 0x9d9e1a  (/unbound-1.20-sanitizer+0x9d9e1a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 
    #15 0x9ca2b2  (/unbound-1.20-sanitizer+0x9ca2b2) event_process_active contrib/libs/libevent/event.c:1783 
    #16 0x9c6d9e  (/unbound-1.20-sanitizer+0x9c6d9e) event_base_loop contrib/libs/libevent/event.c:2006 
    #17 0x9c6176  (/unbound-1.20-sanitizer+0x9c6176) event_base_dispatch contrib/libs/libevent/event.c:1817 
    #18 0x811094  (/unbound-1.20-sanitizer+0x811094) ub_event_base_dispatch unbound/util/ub_event.c:280 
    #19 0x163655e  (/unbound-1.20-sanitizer+0x163655e) comm_base_dispatch unbound/util/netevent.c:282 
    #20 0x8091dd  (/unbound-1.20-sanitizer+0x8091dd) worker_work unbound/daemon/worker.c:2357 
    #21 0x7c079e  (/unbound-1.20-sanitizer+0x7c079e) thread_start unbound/daemon/daemon.c:638 
    #22 0x90b833  (/unbound-1.20-sanitizer+0x90b833) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 

previously allocated by thread T0 here:
    #0 0x8fc3c3  (/unbound-1.20-sanitizer+0x8fc3c3) malloc /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:69 
    #1 0x166cb1c  (/unbound-1.20-sanitizer+0x166cb1c) regional_create_custom_large_object unbound/util/regional.c:94 
    #2 0x166cadc  (/unbound-1.20-sanitizer+0x166cadc) regional_create_custom unbound/util/regional.c:108 
    #3 0x15c331f  (/unbound-1.20-sanitizer+0x15c331f) prealloc_blocks unbound/util/alloc.c:91 
    #4 0x15c3258  (/unbound-1.20-sanitizer+0x15c3258) alloc_init unbound/util/alloc.c:122 
    #5 0x7bd153  (/unbound-1.20-sanitizer+0x7bd153) daemon_create_workers unbound/daemon/daemon.c:581 
    #6 0x7bbb92  (/unbound-1.20-sanitizer+0x7bbb92) daemon_fork unbound/daemon/daemon.c:798 
    #7 0x7eb5b9  (/unbound-1.20-sanitizer+0x7eb5b9) run_daemon unbound/daemon/unbound.c:738 
    #8 0x7eaa26  (/unbound-1.20-sanitizer+0x7eaa26) main unbound/daemon/unbound.c:851 
    #9 0x7ffbc2d5bd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)

Thread T14 created by T0 here:
    #0 0x8c9f44  (/unbound-1.20-sanitizer+0x8c9f44) pthread_create /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:208 
    #1 0x7bd81c  (/unbound-1.20-sanitizer+0x7bd81c) daemon_start_others unbound/daemon/daemon.c:654 
    #2 0x7bbb9e  (/unbound-1.20-sanitizer+0x7bbb9e) daemon_fork unbound/daemon/daemon.c:809 
    #3 0x7eb5b9  (/unbound-1.20-sanitizer+0x7eb5b9) run_daemon unbound/daemon/unbound.c:738 
    #4 0x7eaa26  (/unbound-1.20-sanitizer+0x7eaa26) main unbound/daemon/unbound.c:851 
    #5 0x7ffbc2d5bd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)

SUMMARY: AddressSanitizer: heap-use-after-free (/unbound-1.20-sanitizer+0x1532e26) (BuildId: 536b8443a1d84c8ba2dfd3bcef518ba8fce2555b) 
Shadow bytes around the buggy address:
  0x629001ced200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced400: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x629001ced480: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
  0x629001ced500: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced680: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x629001ced700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==826107==ABORTING
wcawijngaards commented 2 months ago

I get the impression from the trace that not both patches are included, and specifically not the second one, https://github.com/NLnetLabs/unbound/commit/d52f501d903909096b9d971cbda9e5b65eba6777 is included in the patched unbound. Did the patches, both of them, get used by the patched unbound?

dukeartem commented 2 months ago

i re-checked

my diff ``` --- contrib/tools/unbound/services/mesh.c (index) +++ contrib/tools/unbound/services/mesh.c (working tree) @@ -966,6 +966,9 @@ mesh_state_cleanup(struct mesh_state* mstate) for(; rep; rep=rep->next) { infra_wait_limit_dec(mesh->env->infra_cache, &rep->query_reply, mesh->env->cfg); + if(rep->query_reply.c->use_h2) + http2_stream_remove_mesh_state( + rep->query_reply.c->h2_stream); comm_point_drop_reply(&rep->query_reply); log_assert(mesh->num_reply_addrs > 0); mesh->num_reply_addrs--; @@ -1579,6 +1582,10 @@ void mesh_query_done(struct mesh_state* mstate) tcp_req_info_remove_mesh_state(r->query_reply.c->tcp_req_info, mstate); r_buffer = NULL; } + if(r->query_reply.c->use_h2) { + http2_stream_remove_mesh_state( + r->query_reply.c->h2_stream); + } prev = r; prev_buffer = r_buffer; } @@ -2287,6 +2294,9 @@ mesh_serve_expired_callback(void* arg) r, r_buffer, prev, prev_buffer); if(r->query_reply.c->tcp_req_info) tcp_req_info_remove_mesh_state(r->query_reply.c->tcp_req_info, mstate); + if(r->query_reply.c->use_h2) + http2_stream_remove_mesh_state( + r->query_reply.c->h2_stream); infra_wait_limit_dec(mstate->s.env->infra_cache, &r->query_reply, mstate->s.env->cfg); prev = r; --- contrib/tools/unbound/util/netevent.c (index) +++ contrib/tools/unbound/util/netevent.c (working tree) @@ -3306,6 +3306,13 @@ void http2_stream_add_meshstate(struct http2_stream* h2_stream, h2_stream->mesh_state = m; } +void http2_stream_remove_mesh_state(struct http2_stream* h2_stream) +{ + if(!h2_stream) + return; + h2_stream->mesh_state = NULL; +} + /** delete http2 session server. After closing connection. */ static void http2_session_server_delete(struct http2_session* h2_session) { --- contrib/tools/unbound/util/netevent.h (index) +++ contrib/tools/unbound/util/netevent.h (working tree) @@ -959,6 +959,9 @@ void http2_session_add_stream(struct http2_session* h2_session, void http2_stream_add_meshstate(struct http2_stream* h2_stream, struct mesh_area* mesh, struct mesh_state* m); +/** Remove mesh state from stream. When the mesh state has been removed. */ +void http2_stream_remove_mesh_state(struct http2_stream* h2_stream); + /** * This routine is published for checks and tests, and is only used internally. * handle libevent callback for timer comm. ```

and

readelf -Ws ./unbound-1.20-sanitizer  | grep http2_stream_remove_mesh_state
  1997: 0000000001647300    83 FUNC    GLOBAL DEFAULT   16 http2_stream_remove_mesh_state
 65876: 0000000001647300    83 FUNC    GLOBAL DEFAULT   16 http2_stream_remove_mesh_state

maybe i can gather another debug info? Or test some ugly hack, just for catch the problem.

wcawijngaards commented 2 months ago

The commit https://github.com/NLnetLabs/unbound/commit/8947c2c7646c2f8646b3e10efe25552f5e789068 adds some more fixes. I found several code paths where a connection is dropped, but then the mesh state reference would still be wrong. The commit passes unit tests. Thank you for testing this!

dukeartem commented 2 months ago

nope,

another trace ``` ==917967==ERROR: AddressSanitizer: heap-use-after-free on address 0x629001d604c0 at pc 0x000001533227 bp 0x7f1750f8c810 sp 0x7f1750f8c7f0 READ of size 8 at 0x629001d604c0 thread T14 #0 0x1533226 (/unbound-1.20-sanitizer+0x1533226) mesh_state_remove_reply unbound/services/mesh.c:2103 #1 0x1647c1f (/unbound-1.20-sanitizer+0x1647c1f) http2_stream_delete unbound/util/netevent.c:3293 #2 0x164792d (/unbound-1.20-sanitizer+0x164792d) http2_stream_close_cb unbound/util/netevent.c:3370 #3 0x14c4db3 (/unbound-1.20-sanitizer+0x14c4db3) nghttp2_session_close_stream contrib/libs/nghttp2/lib/nghttp2_session.c:1496 #4 0x14cbc6f (/unbound-1.20-sanitizer+0x14cbc6f) nghttp2_session_on_rst_stream_received contrib/libs/nghttp2/lib/nghttp2_session.c:4554 #5 0x14dcd90 (/unbound-1.20-sanitizer+0x14dcd90) session_process_rst_stream_frame contrib/libs/nghttp2/lib/nghttp2_session.c:4569 #6 0x14d6f1a (/unbound-1.20-sanitizer+0x14d6f1a) nghttp2_session_mem_recv2 contrib/libs/nghttp2/lib/nghttp2_session.c:6548 #7 0x14e03c7 (/unbound-1.20-sanitizer+0x14e03c7) nghttp2_session_recv contrib/libs/nghttp2/lib/nghttp2_session.c:7366 #8 0x165d5d7 (/unbound-1.20-sanitizer+0x165d5d7) comm_point_http2_handle_read unbound/util/netevent.c:3459 #9 0x164980c (/unbound-1.20-sanitizer+0x164980c) comm_point_http_handle_read unbound/util/netevent.c:3510 #10 0x164096b (/unbound-1.20-sanitizer+0x164096b) comm_point_http_handle_callback unbound/util/netevent.c:3897 #11 0x9dc59f (/unbound-1.20-sanitizer+0x9dc59f) event_persist_closure contrib/libs/libevent/event.c:1623 #12 0x9d9e2a (/unbound-1.20-sanitizer+0x9d9e2a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 #13 0x9ca2c2 (/unbound-1.20-sanitizer+0x9ca2c2) event_process_active contrib/libs/libevent/event.c:1783 #14 0x9c6dae (/unbound-1.20-sanitizer+0x9c6dae) event_base_loop contrib/libs/libevent/event.c:2006 #15 0x9c6186 (/unbound-1.20-sanitizer+0x9c6186) event_base_dispatch contrib/libs/libevent/event.c:1817 #16 0x8110a4 (/unbound-1.20-sanitizer+0x8110a4) ub_event_base_dispatch unbound/util/ub_event.c:280 #17 0x163695e (/unbound-1.20-sanitizer+0x163695e) comm_base_dispatch unbound/util/netevent.c:282 #18 0x8091ed (/unbound-1.20-sanitizer+0x8091ed) worker_work unbound/daemon/worker.c:2357 #19 0x7c07ae (/unbound-1.20-sanitizer+0x7c07ae) thread_start unbound/daemon/daemon.c:638 #20 0x90b843 (/unbound-1.20-sanitizer+0x90b843) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 #21 0x8ca086 (/unbound-1.20-sanitizer+0x8ca086) _ZL17asan_thread_startPv /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:199 #22 0x7f17621a8b42 (/lib/x86_64-linux-gnu/libc.so.6+0x94b42) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d) #23 0x7f176223a9ff (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d) 0x629001d604c0 is located 704 bytes inside of 16384-byte region [0x629001d60200,0x629001d64200) freed by thread T14 here: #0 0x8fc0a0 (/unbound-1.20-sanitizer+0x8fc0a0) free /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:52 #1 0x166d29d (/unbound-1.20-sanitizer+0x166d29d) regional_destroy unbound/util/regional.c:141 #2 0x15c55b0 (/unbound-1.20-sanitizer+0x15c55b0) alloc_reg_release unbound/util/alloc.c:345 #3 0x15253c4 (/unbound-1.20-sanitizer+0x15253c4) mesh_state_cleanup unbound/services/mesh.c:995 #4 0x151cbc8 (/unbound-1.20-sanitizer+0x151cbc8) mesh_state_delete unbound/services/mesh.c:1033 #5 0x15320a7 (/unbound-1.20-sanitizer+0x15320a7) mesh_continue unbound/services/mesh.c:1938 #6 0x1522967 (/unbound-1.20-sanitizer+0x1522967) mesh_run unbound/services/mesh.c:1969 #7 0x15248f8 (/unbound-1.20-sanitizer+0x15248f8) mesh_report_reply unbound/services/mesh.c:858 #8 0x7f0cb7 (/unbound-1.20-sanitizer+0x7f0cb7) worker_handle_service_reply unbound/daemon/worker.c:269 #9 0x156289a (/unbound-1.20-sanitizer+0x156289a) serviced_callbacks unbound/services/outside_network.c:3051 #10 0x1567608 (/unbound-1.20-sanitizer+0x1567608) serviced_udp_callback unbound/services/outside_network.c:3392 #11 0x1555372 (/unbound-1.20-sanitizer+0x1555372) outnet_udp_cb unbound/services/outside_network.c:1537 #12 0x163dd65 (/unbound-1.20-sanitizer+0x163dd65) comm_point_udp_callback unbound/util/netevent.c:1145 #13 0x9dc59f (/unbound-1.20-sanitizer+0x9dc59f) event_persist_closure contrib/libs/libevent/event.c:1623 #14 0x9d9e2a (/unbound-1.20-sanitizer+0x9d9e2a) event_process_active_single_queue contrib/libs/libevent/event.c:1682 #15 0x9ca2c2 (/unbound-1.20-sanitizer+0x9ca2c2) event_process_active contrib/libs/libevent/event.c:1783 #16 0x9c6dae (/unbound-1.20-sanitizer+0x9c6dae) event_base_loop contrib/libs/libevent/event.c:2006 #17 0x9c6186 (/unbound-1.20-sanitizer+0x9c6186) event_base_dispatch contrib/libs/libevent/event.c:1817 #18 0x8110a4 (/unbound-1.20-sanitizer+0x8110a4) ub_event_base_dispatch unbound/util/ub_event.c:280 #19 0x163695e (/unbound-1.20-sanitizer+0x163695e) comm_base_dispatch unbound/util/netevent.c:282 #20 0x8091ed (/unbound-1.20-sanitizer+0x8091ed) worker_work unbound/daemon/worker.c:2357 #21 0x7c07ae (/unbound-1.20-sanitizer+0x7c07ae) thread_start unbound/daemon/daemon.c:638 #22 0x90b843 (/unbound-1.20-sanitizer+0x90b843) _ZN6__asan10AsanThread11ThreadStartEy /-S/contrib/libs/clang16-rt/lib/asan/asan_thread.cpp:277 previously allocated by thread T0 here: #0 0x8fc3d3 (/unbound-1.20-sanitizer+0x8fc3d3) malloc /-S/contrib/libs/clang16-rt/lib/asan/asan_malloc_linux.cpp:69 #1 0x166cf1c (/unbound-1.20-sanitizer+0x166cf1c) regional_create_custom_large_object unbound/util/regional.c:94 #2 0x166cedc (/unbound-1.20-sanitizer+0x166cedc) regional_create_custom unbound/util/regional.c:108 #3 0x15c371f (/unbound-1.20-sanitizer+0x15c371f) prealloc_blocks unbound/util/alloc.c:91 #4 0x15c3658 (/unbound-1.20-sanitizer+0x15c3658) alloc_init unbound/util/alloc.c:122 #5 0x7bd163 (/unbound-1.20-sanitizer+0x7bd163) daemon_create_workers unbound/daemon/daemon.c:581 #6 0x7bbba2 (/unbound-1.20-sanitizer+0x7bbba2) daemon_fork unbound/daemon/daemon.c:798 #7 0x7eb5c9 (/unbound-1.20-sanitizer+0x7eb5c9) run_daemon unbound/daemon/unbound.c:738 #8 0x7eaa36 (/unbound-1.20-sanitizer+0x7eaa36) main unbound/daemon/unbound.c:851 #9 0x7f176213dd8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d) Thread T14 created by T0 here: #0 0x8c9f54 (/unbound-1.20-sanitizer+0x8c9f54) pthread_create /-S/contrib/libs/clang16-rt/lib/asan/asan_interceptors.cpp:208 #1 0x7bd82c (/unbound-1.20-sanitizer+0x7bd82c) daemon_start_others unbound/daemon/daemon.c:654 #2 0x7bbbae (/unbound-1.20-sanitizer+0x7bbbae) daemon_fork unbound/daemon/daemon.c:809 #3 0x7eb5c9 (/unbound-1.20-sanitizer+0x7eb5c9) run_daemon unbound/daemon/unbound.c:738 #4 0x7eaa36 (/unbound-1.20-sanitizer+0x7eaa36) main unbound/daemon/unbound.c:851 #5 0x7f176213dd8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d) SUMMARY: AddressSanitizer: heap-use-after-free (/unbound-1.20-sanitizer+0x1533226) (BuildId: 0183975b3363ca4d430f7f4f8ebc219056175a0d) Shadow bytes around the buggy address: 0x629001d60200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60400: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x629001d60480: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd 0x629001d60500: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60680: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x629001d60700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==917967==ABORTING ```

but only one for 12 hours, maybe exist another code paths where a connection is dropped

wcawijngaards commented 2 months ago

The commit https://github.com/NLnetLabs/unbound/commit/8fca3e7c5b0acdee9a2b687cddc41e903c0788d5 is another attempt. It looks like this fixes more. It now removes, for dropped connections, the mesh state for the stream that is associated with the reply, and not the currently, most recent http2 stream, what it did before. I guess in some case there was use of multiple http2 streams, and one was closed, and then it removed the mesh state from the wrong entry, causing a later crash when the first stream was then closed. In addition, it does not remove the mesh state twice in two code paths, and initializes the http2 stream variable more, in an attempt at more correct administration of the mesh state and http2 stream.

The commit therefore fixes I think some additional cases, where multiple http2 streams in one http2 connection were in use, and one of the streams was closed. The fix passes unit tests, that test other things.

dukeartem commented 2 months ago

It's working for 24 hours without any fault. Many thanks for all this tried and as result to solve the problem. Thank you!