NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.08k stars 351 forks source link

Segmentation fault happening nondeterministically under high load #582

Open martin-beran opened 2 years ago

martin-beran commented 2 years ago

Describe the bug After some time, usually under a minute, Unbound crashes with SIGSEGV. It occurs if both msg-cache-size: 0m and rrset-cache-size: 0m are set in the configuration.

To reproduce Steps to reproduce the behavior:

  1. Run unbound
  2. Run a test generating high load. I use resperf from dnsperf-2.8.0 with parameters: while true; do resperf -s 127.0.0.1 -p 5354 -d /home/beran/tmp/dnsperf.domains -t 10 -m 400000 -R; done
  3. After some time, Unbound crashes.

Expected behavior Unbound should continue running and servicing requests.

System:

Configure line: --prefix=/home/beran/avast/local --with-dynlibmodule --with-ssl --with-libnghttp2 --with-libevent Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1j 16 Feb 2021 Linked modules: dns64 dynlib respip validator iterator

BSD licensed, see LICENSE in source package for details. Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues


**Additional information**
Unbound configuration:

server: verbosity: 1 extended-statistics: yes num-threads: 3 interface: 0.0.0.0 interface: 0.0.0.0@5853 interface: 0.0.0.0@5443 port: 5354 so-reuseport: no msg-cache-size: 0m rrset-cache-size: 0m cache-max-ttl: 0 cache-max-negative-ttl: 0 do-daemonize: no access-control: 0.0.0.0/0 allow chroot: "" username: "" use-syslog: no minimal-responses: no module-config: "validator dynlib iterator" tls-service-key: "cert.key" tls-service-pem: "cert.pem" tls-port: 5853 https-port: 5443 tls-session-ticket-keys: "ticket.dat" remote-control: control-enable: yes control-use-cert: "no" forward-zone: name: "." forward-addr: 192.168.2.1 forward-first: yes


Stack of the crashed thread:

(gdb) bt

0 packed_rrset_sizeof (d=d@entry=0x0) at util/data/packed_rrset.c:83

1 0x000055937f661b3a in packed_rrset_copy_region (key=, region=0x7efca4ecb090,

now=1638785514) at util/data/packed_rrset.c:351

2 0x000055937f65cd61 in store_rrsets (region=0x7efca4ecb090, qrep=0x7efca4ecc788,

pside=<optimized out>, leeway=<optimized out>, now=1638785514, rep=0x7efca4df1f90, 
env=0x5593813edc80) at services/cache/dns.c:95

3 dns_cache_store_msg (env=0x5593813edc80, qinfo=0x7efcb3ffe690, hash=701959496,

rep=0x7efca4df1f90, leeway=<optimized out>, pside=<optimized out>, qrep=0x7efca4ecc788, 
flags=256, region=0x7efca4ecb090) at services/cache/dns.c:173

4 0x000055937f65d108 in dns_cache_store (env=0x5593813edc80, msgqinf=0x7efca4ecb120,

msgrep=0x7efca4ecc788, is_referral=<optimized out>, leeway=0, pside=0, region=0x7efca4ecb090, 
flags=256) at services/cache/dns.c:1017

5 0x000055937f66aa2a in iter_dns_store (flags=, region=,

pside=<optimized out>, leeway=<optimized out>, is_referral=0, msgrep=<optimized out>, 
msgqinf=0x7efca4ecb120, env=<optimized out>) at iterator/iter_utils.c:661

6 processFinished (qstate=, iq=, id=)

at iterator/iterator.c:3626

7 0x000055937f66e6c7 in iter_handle (qstate=0x7efca4ecb120, iq=, ie=0x5593813bae70,

id=2) at iterator/iterator.c:3709

8 0x000055937f6713b0 in process_response (event=module_event_reply, outbound=0x7efca4ecb9a0, id=2,

ie=<optimized out>, iq=0x7efca4ecb460, qstate=0x7efca4ecb120) at iterator/iterator.c:3928

9 iter_operate (qstate=0x7efca4ecb120, event=module_event_reply, id=2, outbound=0x7efca4ecb9a0)

at iterator/iterator.c:3962

10 0x000055937f685507 in mesh_run (mesh=0x7efca47736b0, mstate=0x7efca4ecb0d0, ev=,

e=0x7efca4ecb9a0) at services/mesh.c:1710

11 0x000055937f6545c4 in mesh_report_reply (what=0, reply=0x7efcb3ffeba0, e=0x7efca4ecb9a0,

mesh=<optimized out>) at services/mesh.c:775

12 worker_handle_service_reply (c=0x7efca44f6640, arg=0x7efca4ecb9a0, error=0,

reply_info=0x7efcb3ffeba0) at daemon/worker.c:266

13 0x000055937f6f922f in serviced_callbacks (sq=sq@entry=0x7efca4aa28a0, error=error@entry=0,

c=c@entry=0x7efca44f6640, rep=rep@entry=0x7efcb3ffeba0) at services/outside_network.c:2909

14 0x000055937f6fa290 in serviced_udp_callback (c=0x7efca44f6640, arg=0x7efca4aa28a0,

error=error@entry=0, rep=rep@entry=0x7efcb3ffeba0) at services/outside_network.c:3244

15 0x000055937f6f8b2c in outnet_udp_cb (c=, arg=0x7efca41f9360,

error=<optimized out>, reply_info=0x7efcb3ffeba0) at services/outside_network.c:1424

16 0x000055937f6e2f06 in comm_point_udp_callback (fd=1318, event=,

arg=<optimized out>) at util/netevent.c:783

17 0x00007efcba2ae19f in ?? () from /lib/x86_64-linux-gnu/libevent-2.1.so.7

18 0x00007efcba2ae8df in event_base_loop () from /lib/x86_64-linux-gnu/libevent-2.1.so.7

19 0x000055937f6e2050 in ub_event_base_dispatch (base=) at util/ub_event.c:280

20 comm_base_dispatch (b=) at util/netevent.c:256

21 0x000055937f646831 in worker_work (worker=0x5593813ec750) at daemon/worker.c:1940

22 thread_start (arg=0x5593813ec750) at daemon/daemon.c:541

23 0x00007efcb9f66450 in start_thread (arg=0x7efcb3fff640) at pthread_create.c:473

24 0x00007efcb9e86d53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

gthess commented 2 years ago

Hi, is this (packed_rrset_sizeof()) always the place where you see the segfaults? Do you have other configuration options with non-default values?

martin-beran commented 2 years ago

In crashes that I saw, it is always at this place, including the call stack. The only non-default configuration values are those show in the configuration file in my original post.