getdnsapi / stubby

Stubby is the name given to a mode of using getdns which enables it to act as a local DNS Privacy stub resolver (using DNS-over-TLS).
https://dnsprivacy.org/dns_privacy_daemon_-_stubby/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

stubby started crashing randomly #295

Closed froschmett closed 2 years ago

froschmett commented 3 years ago

Hi, I am running stubby on a Ubuntu. dpkg -l | grep stubby stubby 1.4.0-1 amd64 modern asynchronous DNS API (stub resolver)

lsb_release -a Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic

uname -a 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Since today stubby started to crash randomly and I cant imagine why. It ran without any trouble for the last 8 months or so. Because it crashes quite a lot I added some line to the systemd-config so that it restarts on failure.

The failure it throws when it crashes is:

stubby[2557]: stubby: ./gldns/gbuffer.h:461: gldns_buffer_write_at: Assertion `gldns_buffer_available_at(buffer, at, count)' failed. Aug 5 01:55:46 u41 systemd[1]: stubby.service: Main process exited, code=dumped, status=6/ABRT Aug 5 01:55:46 u41 systemd[1]: stubby.service: Failed with result 'core-dump'.

Please let me know if you need further information. I would be really happy if someone can shed some light on this, help or fix this :)

Many thanks and Best Regards, froschmett

saradickinson commented 3 years ago

@froschmett Thanks for the report - does seem strange! Are you able to provide a core-dump file and your stubby.yml config to help with debugging? (Can be done offline)

@wtoorop does this look familiar at all to you?

froschmett commented 3 years ago

@saradickinson Hi,

I had some trouble getting the core dump to work, I hope that it worked. Here are the requested files. I had to rename them to upload: stubby.txt -> stubby.yml core....txt -> core....lz4

Thanks and Best Regards, froschmett stubby.txt core.stubby.64707.8906aa4b60b3496382f789b7408c834a.938.1628171207000000.txt

froschmett commented 3 years ago

@saradickinson Hi,

I installed my setup from the scratch on a totally new install of Ubuntu Server 20.04.2 LTS.

To my surprise the same error still persists. I really have no clue what happened or what is going on. Just wanted to let you know.

Best, froschmett

pitpompej commented 3 years ago

Hi, I have the exact same error on my stubby instance running on a raspberry pi, also since about 4-5 days. stubby 1.5.1-1 armhf Best regards, pit

froschmett commented 3 years ago

Hi,

it seems that this issue is Debian/Ubuntu related.

I installed my setup with latest fedora server from the scratch and it seems to run stable. Fedora uses 0.3.0 via its repository which ran stable so far. I also compiled 0.4.0 and it seems to run stable too. I will survey this further and report back.

Best, froschmett

sn1987 commented 3 years ago

Hi,

i have the same isseue for 3-4 days. I'm running stubby 0.4.0 on CentOS 8.4

Version: CentOS Linux release 8.4.2105

Kernel: 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Stack trace:

Process 139661 (stubby) of user 64707 dumped core.

                                                                 Stack trace of thread 139661:
                                                                 #0  0x00007fdffffd337f raise (libc.so.6)
                                                                 #1  0x00007fdffffbddb5 abort (libc.so.6)
                                                                 #2  0x00007fdffffbdc89 __assert_fail_base.cold.0 (libc.so.6)
                                                                 #3  0x00007fdffffcba76 __assert_fail (libc.so.6)
                                                                 #4  0x0000000000421c05 gldns_buffer_write_at (stubby)
                                                                 #5  0x0000000000421cc7 gldns_buffer_write (stubby)
                                                                 #6  0x000000000042569f _getdns_verify_rrsig (stubby)
                                                                 #7  0x0000000000426229 dnskey_signed_rrset (stubby)
                                                                 #8  0x0000000000426435 a_key_signed_rrset (stubby)
                                                                 #9  0x00000000004287a0 chain_head_validate_with_ta (stubby)
                                                                 #10 0x0000000000428a94 chain_head_validate (stubby)
                                                                 #11 0x0000000000428cb7 chain_set_netreq_dnssec_status (stubby)
                                                                 #12 0x0000000000429a9f check_chain_complete (stubby)
                                                                 #13 0x000000000042465a val_chain_node_cb (stubby)
                                                                 #14 0x000000000042ba05 _getdns_check_dns_req_complete (stubby)
                                                                 #15 0x000000000043dcfe upstream_read_cb (stubby)
                                                                 #16 0x000000000044573c poll_read_cb (stubby)
                                                                 #17 0x0000000000445ecf poll_eventloop_run_once (stubby)
                                                                 #18 0x00000000004461e9 poll_eventloop_run (stubby)
                                                                 #19 0x00000000004172fa getdns_context_run (stubby)
                                                                 #20 0x0000000000405bc6 main (stubby)
                                                                 #21 0x00007fdffffbf493 __libc_start_main (libc.so.6)
                                                                 #22 0x000000000040519e _start (stubby)

Best regards,

sn1987

saradickinson commented 3 years ago

Thanks everyone for gathering for all the info - this looks to me like a problem with DNSSEC validation in getdns. Since it started happening out of the blue, I'm wondering if a lookup suddenly started returning an RRSIG that getdns chokes on.

One option is to try disabling local DNSSEC validation (which is OK if you are using a validating resolver) and see if the crashes stop.

If any of you are able to figure out what lookup triggers this, that would be very helpful. There is no logging in stubby that can help with that, but if you happen to be able to grab a tcpdump/wireshark capture on your local interface it will show the names being looked up so we can try to reproduce this....

@wtoorop Any thoughts?

pitpompej commented 2 years ago

Hi, it has been a while but I might have found the lookup that triggers the crash: dig -t NAPTR @127.0.0.1 -p 10053 rbm.mavenir.com leads to a reproduceable crash on my stubby with the following log

Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[10:28:44.510246] STUBBY: Read config from file /etc/stubby/stubby.yml
[10:28:44.690510] STUBBY: DNSSEC Validation is ON
[10:28:44.694196] STUBBY: Transport list is:
[10:28:44.697517] STUBBY:   - TLS
[10:28:44.698833] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[10:28:44.699143] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[10:28:44.699427] STUBBY: Starting DAEMON....
stubby: ./gldns/gbuffer.h:461: gldns_buffer_write_at: Assertion `gldns_buffer_available_at(buffer, at, count)' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xb6de7230 in __GI_abort () at abort.c:79
#2  0xb6df4ba8 in __assert_fail_base (fmt=0xb6efb6a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0xb6f74870 "gldns_buffer_available_at(buffer, at, count)",
    assertion@entry=0xb6fee040 "", file=0xb6f7458c "./gldns/gbuffer.h", file@entry=0xb6f75034 "gldns_buffer_write_at", line=461, line@entry=3069163176,
    function=function@entry=0xb6f75034 "gldns_buffer_write_at") at assert.c:92
#3  0xb6df4c5c in __GI___assert_fail (assertion=0xb6fee040 "", file=0xb6f75034 "gldns_buffer_write_at", line=3069163176, function=0xb6f75034 "gldns_buffer_write_at")
    at assert.c:101
Backtrace stopped: Cannot access memory at address 0x7370696a

.

calling dig -t A @127.0.0.1 -p 10053 rbm.mavenir.com results in a valid answer but the NAPTR type request results in a crash. Maybe that helps. Regards

wtoorop commented 2 years ago

Thank you @pitpompej !! I will take a look shortly. Can you ping me if I haven't replied in 7 days? Thanks!

pitpompej commented 2 years ago

Hi, as requested by you @wtoorop a ping on this topic. A little late, I know, shame on me ;-) Seasons greatings

wtoorop commented 2 years ago

FYI I can reproduce!! Thank you @pitpompej for providing a reliable way to invoke this bug. Calculation of the needed space for the validation buffer (lines 1492 till 1522 of dnssec.c here) doesn't match the actually needed space for RRs that may have compressed dnames (i.e. from lin 1545 till 1572 of dnssec.c here)

wtoorop commented 2 years ago

Let me quickly drop some notes here for the release which will follow shortly (in 2 or 3 weeks). Commit getdnsapi/getdns@45683d3 fixes the issue, but asserts should not have exited Stubby and certainly not the getdns library in the first place. They need to be compiled with NDEBUG defined if compiling for production (i.e. not debugging). With that, the issues above would have resulted in a failure to DNSSEC validate certain rr types (like NAPTR) instead of exit theprogram. Still TODO:

Foritus commented 2 years ago

hello :) Has this made it into a release yet? I'm using the version that ships with RaspberryPi OS (0.2.5 which is admiteddly already pretty ancient) and have this crash once or twice a day. At the moment I've mitigated it with a high quality systemd unit file hack:

In /etc/systemd/system/multi-user.target.wants/stubby.service under [Service] add these lines:

Restart=on-failure
RestartSec=5s