getdnsapi / stubby

Stubby is the name given to a mode of using getdns which enables it to act as a local DNS Privacy stub resolver (using DNS-over-TLS).
https://dnsprivacy.org/dns_privacy_daemon_-_stubby/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

segfault stubby.c:1021 #189

Open appliedprivacy opened 5 years ago

appliedprivacy commented 5 years ago

Setup

Short intro about our setup:

We run a public DoH service and use dnsdist, today we added stubby to our chain since dnsdist does not support DoT to communicate with backends yet.

nginx -> dnsdist -> stubby -> unbound

(unbound does not run on the same server and so we encrypt using stubby/DoT)

Stubby runs on 127.0.0.1:5353 and dnsdist detected frequent issues:

Jun 30 17:30:32  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:30:33  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:30:37  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:30:48  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:30:50  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:30:55  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:31:34  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:31:43  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:31:51  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:32:02  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:32:13  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:32:18  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:32:21  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:32:38  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:32:41  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:32:42  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'up'
Jun 30 17:33:32  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down' 

by default stubby apparently does not log anything so we edited the systemd service file to start stubby with -v 2 to get some logging:

Log events showing crashes

They are not necessarily about a single bug but maybe multiple distinct bugs.

Jun 30 17:45:45 dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 17:45:49 stubby[30343]: corrupted size vs. prev_size while consolidating
Jun 30 17:45:49 systemd[1]: stubby.service: Main process exited, code=killed, status=6/ABRT
Jun 30 17:45:49 systemd[1]: stubby.service: Failed with result 'signal'.
Jun 30 18:05:47  kernel: [5249485.420516] stubby[4155]: segfault at 3 ip 00007fdd70df8a4e sp 00007ffc02312b78 error 4 in libc-2.28.so[7fdd70d8a000+148000]
Jun 30 18:05:47  kernel: [5249485.430228] Code: b6 07 29 c8 c3 0f 1f 80 00 00 00 00 89 f8 31 d2 66 0f ef ff 09 f0 25 ff 0f 00 00 3d c0 0f 00 00 0f 8f 78 02 00 00 f3 0f 6f 0f <f3> 0f 6f 06 66 0f 74 c1 66 0f da c1 66 0f ef c9 66 0f 74 c1 66 0f
Jun 30 18:05:47  systemd[1]: stubby.service: Main process exited, code=killed, status=11/SEGV
Jun 30 18:05:47  systemd[1]: stubby.service: Failed with result 'signal'.
Jun 30 18:05:48  systemd[1]: stubby.service: Service RestartSec=100ms expired, scheduling restart.
Jun 30 18:06:36  stubby[5703]: double free or corruption (!prev)
Jun 30 18:06:36  systemd[1]: stubby.service: Main process exited, code=killed, status=6/ABRT
Jun 30 18:06:36  systemd[1]: stubby.service: Failed with result 'signal'.
Jun 30 18:06:36  systemd[1]: stubby.service: Service RestartSec=100ms expired, scheduling restart.
Jun 30 18:07:57  kernel: [5249614.912363] traps: stubby[5724] general protection ip:7fe7a42cb00e sp:7fff8674a5e0 error:0 in libgetdns.so.10.1.1[7fe7a42b9000+4b000]
Jun 30 18:07:57  systemd[1]: stubby.service: Main process exited, code=killed, status=11/SEGV
Jun 30 18:07:57  systemd[1]: stubby.service: Failed with result 'signal'.
Jun 30 18:07:57  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'
Jun 30 18:08:49  kernel: [5249666.813443] traps: stubby[5752] general protection ip:7fa9a23c05ad sp:7fff7b0329e0 error:0 in libgetdns.so.10.1.1[7fa9a23af000+4b000]
Jun 30 18:08:49  systemd[1]: stubby.service: Main process exited, code=killed, status=11/SEGV
Jun 30 18:08:49  systemd[1]: stubby.service: Failed with result 'signal'.
Jun 30 18:08:49  dnsdist[18150]: Marking downstream 127.0.0.1:5353 as 'down'

gdb backtrace

There are no debug symbols in the installed version of stubby but maybe it is useful nontheless:

(gdb) run
Starting program: /usr/bin/stubby -v4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[16:23:50.393479] STUBBY: Read config from file /etc/stubby/stubby.yml
[16:23:50.393889] STUBBY: DNSSEC Validation is ON
[16:23:50.394030] STUBBY: Transport list is:
[16:23:50.394102] STUBBY:   - TLS
[16:23:50.394169] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[16:23:50.394230] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[16:23:50.394291] STUBBY: Starting DAEMON....

Program received signal SIGSEGV, Segmentation fault.
0x00005555555ae6f0 in ?? ()
(gdb) backtrace
#0  0x00005555555ae6f0 in ?? ()
#1  0x00007ffff7d7f056 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#2  0x00007ffff7d7f0ac in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#3  0x00007ffff7d3f39b in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#4  0x00007ffff7d3f512 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#5  0x00007ffff7d3f855 in getdns_dict_get_int () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#6  0x0000555555558195 in ?? ()
#7  0x00007ffff7d4a543 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#8  0x00007ffff7d496bd in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#9  0x00007ffff7d499c5 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#10 0x00007ffff7d4aa76 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#11 0x00007ffff7d598f5 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#12 0x00007ffff7d70c83 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#13 0x00007ffff7d70e1d in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#14 0x0000555555556ab7 in ?? ()
#15 0x00007ffff7b8a09b in __libc_start_main (main=0x555555556520, argc=2, argv=0x7fffffffe5f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5e8) at ../csu/libc-start.c:308
#16 0x0000555555556fca in ?? ()

OS: Debian Buster

dpkg -l stubby
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-===========================================
ii  stubby         1.5.1-1      amd64        modern asynchronous DNS API (stub resolver)

Configuration

stubby -i
[16:28:46.696821] STUBBY: Read config from file /etc/stubby/stubby.yml
{
  "all_context":
  {
    "add_warning_for_bad_dns": GETDNS_EXTENSION_FALSE,
    "appdata_dir": <bindata of "/home/i/.getdns/">,
    "append_name": GETDNS_APPEND_NAME_TO_SINGLE_LABEL_FIRST,
    "dns_transport_list":
    [
      GETDNS_TRANSPORT_TLS
    ],
    "dnssec": GETDNS_EXTENSION_TRUE,
    "dnssec_allowed_skew": 0,
    "dnssec_return_all_statuses": GETDNS_EXTENSION_FALSE,
    "dnssec_return_full_validation_chain": GETDNS_EXTENSION_FALSE,
    "dnssec_return_only_secure": GETDNS_EXTENSION_FALSE,
    "dnssec_return_status": GETDNS_EXTENSION_FALSE,
    "dnssec_return_validation_chain": GETDNS_EXTENSION_FALSE,
    "dnssec_roadblock_avoidance": GETDNS_EXTENSION_FALSE,
    "dnssec_trust_anchors":
    [
      {
        "class": GETDNS_RRCLASS_IN,
        "name": <bindata for .>,
        "rdata":
        {
          "algorithm": 8,
          "flags": 257,
          "protocol": 3,
          "public_key": <bindata of 0x03010001acffb409bcc939f831f7a1e5...>,
          "rdata_raw": <bindata of 0x0101030803010001acffb409bcc939f8...>
        },
        "ttl": 86400,
        "type": GETDNS_RRTYPE_DNSKEY
      }
    ],
    "edns_client_subnet_private": 1,
    "edns_cookies": GETDNS_EXTENSION_FALSE,
    "edns_do_bit": 0,
    "edns_extended_rcode": 0,
    "edns_version": 0,
    "follow_redirects": GETDNS_REDIRECTS_FOLLOW,
    "hosts": <bindata of "/etc/hosts">,
    "idle_timeout": 10000,
    "limit_outstanding_queries": 1000,
    "max_backoff_value": 1000,
    "namespaces":
    [
      GETDNS_NAMESPACE_LOCALNAMES,
      GETDNS_NAMESPACE_DNS
    ],
    "resolution_type": GETDNS_RESOLUTION_STUB,
    "resolvconf": <bindata of "/etc/resolv.conf">,
    "return_both_v4_and_v6": GETDNS_EXTENSION_FALSE,
    "return_call_reporting": GETDNS_EXTENSION_FALSE,
    "round_robin_upstreams": 1,
    "specify_class": 1,
    "suffix": [],
    "timeout": 5000,
    "tls_authentication": GETDNS_AUTHENTICATION_REQUIRED,
    "tls_backoff_time": 2,
    "tls_ca_path": <bindata of "/etc/ssl/certs/">,
    "tls_cipher_list": <bindata of "EECDH+AESGCM:EECDH+CHACHA20">,
    "tls_ciphersuites": <bindata of "TLS_AES_256_GCM_SHA384:TLS_CHACH"...>,
    "tls_connection_retries": 3,
    "tls_min_version": GETDNS_TLS1_2,
    "tls_query_padding_blocksize": 128,
    "trust_anchors_backoff_time": 2500,
    "trust_anchors_url": <bindata of "http://data.iana.org/root-anchor"...>,
    "trust_anchors_verify_CA": <bindata of 0x2d2d2d2d2d424547494e204345525449...>,
    "trust_anchors_verify_email": <bindata of "dnssec@iana.org">,
    "upstream_recursive_servers":
    [
      {
        "address_data": <bindata for 109.70.100.100>,
        "address_type": <bindata of "IPv4">,
        "tls_auth_name": <bindata of "dot2.appliedprivacy.net">
      }
    ]
  },
  "api_version_number": 132058112,
  "api_version_string": <bindata of "December 2015">,
  "compilation_comment": <bindata of "getdns 1.5.1 configured on  for "...>,
  "default_hosts_location": <bindata of "/etc/hosts">,
  "default_resolvconf_location": <bindata of "/etc/resolv.conf">,
  "default_trust_anchor_location": <bindata of "/usr/share/dns/root.key">,
  "implementation_string": <bindata of "https://getdnsapi.net">,
  "listen_addresses":
  [
    {
      "address_data": <bindata for 127.0.0.1>,
      "address_type": <bindata of "IPv4">,
      "port": 5353
    }
  ],
  "openssl_build_version_number": 269488159,
  "openssl_built_on": <bindata of "built on: Thu May 30 15:27:48 20"...>,
  "openssl_cflags": <bindata of "compiler: gcc -fPIC -pthread -m6"...>,
  "openssl_dir": <bindata of "OPENSSLDIR: "/usr/lib/ssl"">,
  "openssl_engines_dir": <bindata of "ENGINESDIR: "/usr/lib/x86_64-lin"...>,
  "openssl_platform": <bindata of "platform: debian-amd64">,
  "openssl_version_number": 269488191,
  "openssl_version_string": <bindata of "OpenSSL 1.1.1c  28 May 2019">,
  "resolution_type": GETDNS_RESOLUTION_STUB,
  "version_number": 17105152,
  "version_string": <bindata of "1.5.1">
}
Result: Config file syntax is valid.
wtoorop commented 5 years ago

Thanks for reporting. Did you compile getdns/stubby yourself? Would it be possible for you to compile latest (so version 1.5.2) of getdns/stubby with debugging symbols?

appliedprivacy commented 5 years ago

Did you compile getdns/stubby yourself?

stubby has been installed using apt https://packages.debian.org/buster/stubby

Would it be possible for you to compile latest (so version 1.5.2) of getdns/stubby with debugging symbols?

Do you sign releases? I didn't find *.asc files. https://github.com/getdnsapi/stubby/releases

wtoorop commented 5 years ago

Releases are announced and published here: https://getdnsapi.net/releases/getdns-1-5-2/ I also synchronize this with github releases here: https://github.com/getdnsapi/getdns/releases asc file is below the star button ;)

appliedprivacy commented 5 years ago
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d5f980 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
(gdb) backtrace
#0  0x00007ffff7d5f980 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#1  0x00007ffff7d7ed56 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#2  0x00007ffff7d4a6cc in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#3  0x00007ffff7d4b66b in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#4  0x00007ffff7d4b89a in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#5  0x00007ffff7d4beb2 in getdns_general () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#6  0x0000555555557fce in incoming_request_handler (context=0x555555560260, callback_type=GETDNS_CALLBACK_COMPLETE, request=0x5555555c94d0, userarg=0x0, request_id=93824992630912)
    at ./../stubby/src/stubby.c:684
#7  0x00007ffff7d54c7c in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#8  0x00007ffff7d70c83 in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#9  0x00007ffff7d70e1d in ?? () from /usr/lib/x86_64-linux-gnu/libgetdns.so.10
#10 0x0000555555558e6b in main (argc=4, argv=0x7fffffffe568) at ./../stubby/src/stubby.c:1021
wtoorop commented 5 years ago

Hmmm... it doesn't look like the getdns was compiled with debugging symbols... Did you configure like this?:

$ ./configure CFLAGS=-g --with-stubby

Also, could you do a

$ ls -l /usr/lib/x86_64-linux-gnu/libgetdns.so.10

? For getdns-1.5.2, it should point to libgetdns.so.10.1.2

appliedprivacy commented 5 years ago

sorry that was my fault (LD_LIBRARY_PATH was not set to include the correct folder)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d7a393 in _getdns_rbtree_first (rbtree=0x555555560bd8) at ./util/rbtree.c:553
553     for (node = rbtree->root; node->left != RBTREE_NULL; node = node->left);
(gdb) backtrace
#0  0x00007ffff7d7a393 in _getdns_rbtree_first (rbtree=0x555555560bd8) at ./util/rbtree.c:553
#1  0x00007ffff7d3e4ad in _getdns_netreq_change_state (netreq=0x5555555d8e40, new_state=NET_REQ_FINISHED) at ./general.c:380
#2  0x00007ffff7d4efa7 in upstream_read_cb (userarg=0x55555558d358) at ./stub.c:1561
#3  0x00007ffff7d6c3d3 in poll_read_cb (fd=5, event=0x55555558d438) at ./extension/poll_eventloop.c:295
#4  0x00007ffff7d6cb66 in poll_eventloop_run_once (loop=0x555555560c60, blocking=1) at ./extension/poll_eventloop.c:445
#5  0x00007ffff7d6ce80 in poll_eventloop_run (loop=0x555555560c60) at ./extension/poll_eventloop.c:499
#6  0x00007ffff7d5f758 in getdns_context_run (context=0x555555560260) at ./context.c:3742
#7  0x0000555555558e6b in main (argc=4, argv=0x7fffffffe528) at ./../stubby/src/stubby.c:1021
wtoorop commented 5 years ago

Thanks. Does it crash at the same point every time?

appliedprivacy commented 5 years ago

I didn't collect many backtraces, but here is another:

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7b79535 in __GI_abort () at abort.c:79
#2  0x00007ffff7bd0508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7cdb28d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff7bd6c1a in malloc_printerr (str=str@entry=0x7ffff7cdd018 "double free or corruption (!prev)") at malloc.c:5341
#4  0x00007ffff7bd873c in _int_free (av=0x7ffff7d12c40 <main_arena>, p=0x5555555c02c0, have_lock=<optimized out>) at malloc.c:4309
#5  0x00007ffff7d4370b in _getdns_dns_req_free (req=0x5555555c02d0) at ./request-internal.c:648
#6  0x00007ffff7d5e9d8 in _getdns_context_cancel_request (dnsreq=0x5555555c02d0) at ./context.c:3255
#7  0x00007ffff7d5eb76 in _getdns_context_request_timed_out (dnsreq=0x5555555c02d0) at ./context.c:3305
#8  0x00007ffff7d3d8fc in _getdns_check_dns_req_complete (dns_req=0x5555555c02d0) at ./general.c:113
#9  0x00007ffff7d3e25f in _getdns_check_expired_pending_netreqs (context=0x555555560260, now_ms=0x7fffffffdea8) at ./general.c:324
#10 0x00007ffff7d5a920 in _getdns_check_expired_pending_netreqs_cb (arg=0x555555560260) at ./context.c:1305
#11 0x00007ffff7d6c444 in poll_timeout_cb (event=0x555555560bf8) at ./extension/poll_eventloop.c:314
#12 0x00007ffff7d6ce28 in poll_eventloop_run_once (loop=0x555555560c60, blocking=1) at ./extension/poll_eventloop.c:485
#13 0x00007ffff7d6ce80 in poll_eventloop_run (loop=0x555555560c60) at ./extension/poll_eventloop.c:499
#14 0x00007ffff7d5f758 in getdns_context_run (context=0x555555560260) at ./context.c:3742
#15 0x0000555555558e6b in main (argc=4, argv=0x7fffffffe528) at ./../stubby/src/stubby.c:1021
wtoorop commented 5 years ago

Ok thanks... stubby is talking to unbound-1.9.2?

appliedprivacy commented 5 years ago

yes, stubby is talking to unbound 1.9.2 (on FreeBSD)

wtoorop commented 5 years ago

Could you also try disabling DNSSEC validation within stubby?

appliedprivacy commented 5 years ago

It did run for 40 minutes without a crash with DNSSEC validation disabled. It crashed within a minute after enabling DNSSEC validation again.

wtoorop commented 5 years ago

Ok, so... since you control and also authenticate the upstream DoT speaking unbound. Perhaps, for the moment it is okay to let that unbound do the dnssec validation and let stubby just trust the answers... while I try to reproduce and solve this issue...

appliedprivacy commented 5 years ago

Is there any update on this issue?

appliedprivacy commented 5 years ago

What approx. amount of funding would you require to provide a fix for this issue and release a new version including the fix?

wtoorop commented 5 years ago

Sorry for not responding earlier. This codepath is also subject to change with a planned refactor of upstream scheduling (add DoH upstreams in the mix). Also, I'm willing to have another look to see if I can get a quick workaround of some sort, but this will be possible only after 19th of august.

appliedprivacy commented 5 years ago

Thanks for the heads up, we would still be interested in the required funding since we are applying for funding ourselves and it would help us to have more accurate estimations for future fixes like this.

wtoorop commented 4 years ago

Maybe we can discuss coordination of fixes (and possible funding) in person? Are you going to ietf107?