NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.14k stars 358 forks source link

unbound 1.19 cause FreeBSD 14.0-RELEASE panic #977

Closed hshh closed 10 months ago

hshh commented 11 months ago

unbound 1.19, FreeBSD 14.0-RELEASE

Because I am running multiple unbound instance in same host with different config file, I am not sure which one cause panic. I will keep updating this thread.

Here is panic info,

Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 00
fault virtual address   = 0xb8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d4f743
stack pointer           = 0x28:0xfffffe01add20c50
frame pointer           = 0x28:0xfffffe01add20ce0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2863 (unbound)
rdi: ffffffff818ca280 rsi: ffffffff818ca280 rdx: 0000000000010200
rcx: 0000000000000000  r8: fffff8063843b900  r9: 00000000c1df0200
rax: fffff80b9a4ab000 rbx: 000000000000c1df rbp: fffffe01add20ce0
r10: 0000000000000000 r11: fffffe006c36b700 r12: fffff8033d9d68c0
r13: fffff802c2961000 r14: fffff8033d9d6970 r15: 0000000000000000
trap number             = 12
panic: page fault
cpuid = 8
time = 1702272950
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01add20930
vpanic() at vpanic+0x132/frame 0xfffffe01add20a60
panic() at panic+0x43/frame 0xfffffe01add20ac0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe01add20b20
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe01add20b80
calltrap() at calltrap+0x8/frame 0xfffffe01add20b80
--- trap 0xc, rip = 0xffffffff80d4f743, rsp = 0xfffffe01add20c50, rbp = 0xfffffe01add20ce0 ---
in_pcbbind_setup() at in_pcbbind_setup+0x233/frame 0xfffffe01add20ce0
in_pcbbind() at in_pcbbind+0x58/frame 0xfffffe01add20d10
udp_bind() at udp_bind+0xc8/frame 0xfffffe01add20d50
sobind() at sobind+0x32/frame 0xfffffe01add20d70
kern_bindat() at kern_bindat+0xc5/frame 0xfffffe01add20dc0
sys_bind() at sys_bind+0x9b/frame 0xfffffe01add20e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe01add20f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01add20f30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x82632312a, rsp = 0x829110bb8, rbp = 0x829110c10 ---
hshh commented 11 months ago

2 unbound.conf attached. Multiple unbound instance with unbound.conf.1 is running with setfib. As I said in issue #957.

unbound.conf.1

server:
        auto-trust-anchor-file: "root.key"
        root-hints: "root.hints"
        num-threads: 16
        interface: 127.0.0.1
        interface: ::1
        outgoing-range: 32768
        num-queries-per-thread: 4096
        so-reuseport: yes
        msg-cache-slabs: 16
        rrset-cache-slabs: 16
        infra-cache-slabs: 16
        key-cache-slabs: 16
        ratelimit-slabs: 16
        cache-max-negative-ttl: 0
        infra-host-ttl: 300
        module-config: "iterator"
        hide-identity: yes
        hide-version: yes
        rrset-roundrobin: yes
        do-not-query-localhost: no
        log-servfail: yes
        port: 53000
        pidfile: "unbound.pid"

remote-control:
        control-port: 53100
        control-enable: yes
        control-use-cert: no
        control-interface: 127.0.0.1
        control-interface: ::1

# root
auth-zone:
        name: "."
        primary: 199.9.14.201           # b.root-servers.net
        primary: 192.33.4.12            # c.root-servers.net
        primary: 199.7.91.13            # d.root-servers.net
        primary: 192.5.5.241            # f.root-servers.net
        primary: 192.112.36.4           # g.root-servers.net
        primary: 193.0.14.129           # k.root-servers.net
        primary: 192.0.47.132           # xfr.cjr.dns.icann.org
        primary: 192.0.32.132           # xfr.lax.dns.icann.org
        primary: 2001:500:200::b        # b.root-servers.net
        primary: 2001:500:2::c          # c.root-servers.net
        primary: 2001:500:2d::d         # d.root-servers.net
        primary: 2001:500:2f::f         # f.root-servers.net
        primary: 2001:500:12::d0d       # g.root-servers.net
        primary: 2001:7fd::1            # k.root-servers.net
        primary: 2620:0:2830:202::132   # xfr.cjr.dns.icann.org
        primary: 2620:0:2d0:202::132    # xfr.lax.dns.icann.org
        fallback-enabled: yes
        for-downstream: no
        for-upstream: yes

unbound.conf.2

server:
        port: 5300
        chroot: "/usr/local/etc/unbound"
        directory: "/usr/local/etc/unbound"
        #verbosity: 1
        num-threads: 16
        interface: 0.0.0.0
        interface: ::0
        outgoing-range: 32768
        num-queries-per-thread: 4096
        so-reuseport: yes
        msg-cache-slabs: 16
        rrset-cache-slabs: 16
        infra-cache-slabs: 16
        key-cache-slabs: 16
        ratelimit-slabs: 16
        rrset-cache-size: 100k
        msg-cache-size: 100k
        key-cache-size: 100k
        cache-max-negative-ttl: 0
        infra-keep-probing: yes
        access-control: 0.0.0.0/0 allow
        auto-trust-anchor-file: "root.key"
        pidfile: "unbound.pid"
        module-config: "iterator"
        hide-identity: yes
        hide-version: yes
        rrset-roundrobin: yes
        do-not-query-localhost: no
        log-servfail: yes
        private-address: ::/0

python:
remote-control:
        control-enable: yes
        control-use-cert: no
        control-interface: 127.0.0.1
        control-interface: ::1
        control-port: 5303

forward-zone:
        name: "."
        forward-addr: 127.0.0.1
        forward-no-cache: yes
gthess commented 11 months ago

This seems like a FreeBSD kernel error. From Unbound's side I see that this may happen during a UDP bind() call. Do I understand correctly that unbound.conf.1 has multiple instances with setfib, while unbound.conf.2 has only one? Do you see these often? And am I correct to assume that this happens during startup/reload? Does the same setup work reliably with older versions?

Things you could try:

hshh commented 11 months ago

Hmm, I tested unbound-1.17.1, and it caused kernel panic too. I think it is an OS bug. The system does not crash during startup/reload of unbound. It happens irregularly. Now I am running unbound 1.19 with "so-reuseport: no" for testing. And I saved the PID information of all unbound processes.

glebius commented 10 months ago

This is not unbound bug. It is FreeBSD kernel bug. And it looks very similar to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273890 Please see details and try the patch.

hshh commented 10 months ago

"so-reuseport: no" only reduces the occurrence of the problem, it can still lead to kernel panic. I am testing the kernel patch now.

wravoc commented 10 months ago

Interested if this was the fix for you @hshh? Did you get it done?

hshh commented 10 months ago

Interested if this was the fix for you @hshh? Did you get it done?

Yes, the system has become stable after applying the kernel patch. The uptime is 16 days now.

I hope it can be merged into 14.0-RELEASE, @glebius .

glebius commented 10 months ago

It can't be merged into 14.0-RELEASE, since the release happened in the past. We might make a Errata Notice and a patch level for the releng branch.

gthess commented 10 months ago

Closing this as a non-issue for Unbound then :)