NLnetLabs / nsd

The NLnet Labs Name Server Daemon (NSD) is an authoritative, RFC compliant DNS nameserver.
https://nlnetlabs.nl/nsd
BSD 3-Clause "New" or "Revised" License
462 stars 103 forks source link

nsd 4.3.7 crash answer_delegation: Assertion `query->delegation_rrset' failed #189

Closed jaredmauch closed 3 years ago

jaredmauch commented 3 years ago

Starting today I've seen this crash occuring on my system where it appears that q->qname is NULL

here's some information, I have several core files.

Aug 20 12:00:49 puck nsd[1983194]: nsd: query.c:894: answer_delegation: Assertion query->delegation_rrset' failed. Aug 20 12:00:49 puck nsd[1983197]: nsd: query.c:894: answer_delegation: Assertionquery->delegation_rrset' failed.

[New LWP 1979159]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/nsd -d -c /etc/nsd/nsd.conf'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f66dfc222a2 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64 libcap-2.48-2.fc34.x86_64 libevent-2.1.12-3.fc34.x86_64 libffi-3.1-28.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libgcrypt-1.9.3-3.fc34.x86_64 libgpg-error-1.42-1.fc34.x86_64 libselinux-3.2-1.fc34.x86_64 libxcrypt-4.4.25-1.fc34.x86_64 libzstd-1.5.0-1.fc34.x86_64 lz4-libs-1.9.3-2.fc34.x86_64 openssl-libs-1.1.1k-1.fc34.x86_64 p11-kit-0.23.22-3.fc34.x86_64 pcre2-10.36-4.fc34.x86_64 systemd-libs-248.7-1.fc34.x86_64 xz-libs-5.2.5-5.fc34.x86_64 zlib-1.2.11-26.fc34.x86_64
(gdb) up
#1  0x00007f66dfc0b8a4 in abort () from /lib64/libc.so.6
(gdb) up
#2  0x00007f66dfc0b789 in __assert_fail_base.cold () from /lib64/libc.so.6
(gdb) up
#3  0x00007f66dfc1aa16 in __assert_fail () from /lib64/libc.so.6
(gdb) up
#4  0x000055ba51266b43 in answer_delegation (answer=0x7ffcc1ed1e70, query=0x55ba651619e0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:894
u894        assert(query->delegation_rrset);
(gdb) print query
$1 = (query_type *) 0x55ba651619e0
(gdb) print *query
$2 = {region = 0x0, addr = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, addrlen = 0, maxlen = 0, reserved_space = 0, edns = {
    status = EDNS_NOT_PRESENT, position = 0, maxlen = 0, opt_reserved_space = 0, dnssec_ok = 0, nsid = 0, cookie_status = COOKIE_NOT_PRESENT, cookie_len = 0, 
    cookie = '\000' <repeats 39 times>, ede = 0, ede_text = 0x0, ede_text_len = 0}, tsig = {status = TSIG_NOT_PRESENT, position = 0, response_count = 0, 
    updates_since_last_prepare = 0, context = 0x0, algorithm = 0x0, key = 0x0, prior_mac_size = 0, prior_mac_data = 0x0, rr_region = 0x0, context_region = 0x0, key_name = 0x0, 
    algorithm_name = 0x0, signed_time_high = 0, signed_time_low = 0, signed_time_fudge = 0, mac_size = 0, mac_data = 0x0, original_query_id = 0, error_code = 0, other_size = 0, 
    other_data = 0x0}, tsig_prepare_it = 0, tsig_update_it = 0, tsig_sign_it = 0, tcp = 0, tcplen = 0, packet = 0x0, qname = 0x0, qtype = 0, qclass = 0, zone = 0x0, 
  delegation_domain = 0x0, delegation_rrset = 0x0, opcode = 0 '\000', cname_count = 0, compressed_dname_count = 0, compressed_dnames = 0x0, compressed_dname_offsets = 0x0, 
  compressed_dname_offsets_size = 0, number_temporary_domains = 0, axfr_is_done = 0, axfr_zone = 0x0, axfr_current_domain = 0x0, axfr_current_rrset = 0x0, axfr_current_rr = 0, 
  wildcard_domain = 0x0}
(gdb) up
#5  answer_lookup_zone (nsd=0x55ba51301340 <nsd>, q=0x55ba651619e0, answer=0x7ffcc1ed1e70, domain_number=0, exact=<optimized out>, closest_match=0x55ba544eb563, 
    closest_encloser=<optimized out>, qname=0x0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1438
1438                answer_delegation(q, answer);
(gdb) print q
$3 = (struct query *) 0x55ba651619e0
(gdb) print *q
$4 = {region = 0x0, addr = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, addrlen = 0, maxlen = 0, reserved_space = 0, edns = {
    status = EDNS_NOT_PRESENT, position = 0, maxlen = 0, opt_reserved_space = 0, dnssec_ok = 0, nsid = 0, cookie_status = COOKIE_NOT_PRESENT, cookie_len = 0, 
    cookie = '\000' <repeats 39 times>, ede = 0, ede_text = 0x0, ede_text_len = 0}, tsig = {status = TSIG_NOT_PRESENT, position = 0, response_count = 0, 
    updates_since_last_prepare = 0, context = 0x0, algorithm = 0x0, key = 0x0, prior_mac_size = 0, prior_mac_data = 0x0, rr_region = 0x0, context_region = 0x0, key_name = 0x0, 
    algorithm_name = 0x0, signed_time_high = 0, signed_time_low = 0, signed_time_fudge = 0, mac_size = 0, mac_data = 0x0, original_query_id = 0, error_code = 0, other_size = 0, 
    other_data = 0x0}, tsig_prepare_it = 0, tsig_update_it = 0, tsig_sign_it = 0, tcp = 0, tcplen = 0, packet = 0x0, qname = 0x0, qtype = 0, qclass = 0, zone = 0x0, 
  delegation_domain = 0x0, delegation_rrset = 0x0, opcode = 0 '\000', cname_count = 0, compressed_dname_count = 0, compressed_dnames = 0x0, compressed_dname_offsets = 0x0, 
  compressed_dname_offsets_size = 0, number_temporary_domains = 0, axfr_is_done = 0, axfr_zone = 0x0, axfr_current_domain = 0x0, axfr_current_rrset = 0x0, axfr_current_rr = 0, 
  wildcard_domain = 0x0}
(gdb) print *answer
value of type `answer_type' requires 204808 bytes, which is more than max-value-size
(gdb) up
#6  0x000055ba51266cbc in answer_query (nsd=nsd@entry=0x55ba51301340 <nsd>, q=q@entry=0x55ba651619e0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1456
1456        answer_lookup_zone(nsd, q, &answer, 0, exact, closest_match,
(gdb) print nsd
$5 = (struct nsd *) 0x55ba51301340 <nsd>
(gdb) print *nsd
$6 = {region = 0x0, pid = 0, mode = 0, signal_hint_reload_hup = 0, signal_hint_reload = 0, signal_hint_child = 0, signal_hint_quit = 0, signal_hint_shutdown = 0, 
  signal_hint_stats = 0, signal_hint_statsusr = 0, quit_sync_done = 0, server_kind = 0, db = 0x0, debug = 0, child_count = 0, children = 0x0, restart_children = 0, 
  reload_failed = 0, this_child = 0x0, task = {0x0, 0x0}, mytask = 0, event_base = 0x0, server_region = 0x0, xfrd_listener = 0x0, rc = 0x0, dbfile = 0x0, pidfile = 0x0, 
  log_filename = 0x0, username = 0x0, uid = 0, gid = 0, chrootdir = 0x0, version = 0x0, identity = 0x0, nsid_len = 0, nsid = 0x0, file_rotation_ok = 0 '\000', 
  use_cpu_affinity = 0, cpuset = 0x0, xfrd_cpuset = 0x0, ifs = 0, reuseport = 0, tcp = 0x0, udp = 0x0, edns_ipv4 = {ok = "\000\000\000\000\000\000\000\000", 
    error = "\000\000\000\000\000\000\000\000", rdata_none = "\000", nsid = "\000\000\000", cookie = "\000\000\000"}, edns_ipv6 = {ok = "\000\000\000\000\000\000\000\000", 
    error = "\000\000\000\000\000\000\000\000", rdata_none = "\000", nsid = "\000\000\000", cookie = "\000\000\000"}, maximum_tcp_count = 0, current_tcp_count = 0, 
  tcp_query_count = 0, tcp_timeout = 0, tcp_mss = 0, outgoing_tcp_mss = 0, ipv4_edns_size = 0, ipv6_edns_size = 0, st = {boot = 0, period = 0, qtype = {0 <repeats 257 times>}, 
    qclass = {0, 0, 0, 0}, qudp = 0, qudp6 = 0, ctcp = 0, ctcp6 = 0, ctls = 0, ctls6 = 0, rcode = {0 <repeats 17 times>}, opcode = {0, 0, 0, 0, 0, 0}, dropped = 0, truncated = 0, 
    wrongzone = 0, txerr = 0, rxerr = 0, edns = 0, ednserr = 0, raxfr = 0, nona = 0, db_disk = 0, db_mem = 0}, zonestat = {0x0, 0x0}, zonestatfd = {0, 0}, zonestatfname = {0x0, 
    0x0}, zonestatsize = {0, 0}, zonestatdesired = 0, zonestatsizenow = 0, zonestatnow = 0x0, err_limit_time = 0, err_limit_count = 0, do_answer_cookie = 0, cookie_count = 0, 
  cookie_secrets = {{cookie_secret = '\000' <repeats 15 times>}, {cookie_secret = '\000' <repeats 15 times>}}, options = 0x0, tls_ctx = 0x0}
(gdb) print exact
$7 = <optimized out>
(gdb) print closest_match
$8 = (domain_type *) 0x0
(gdb) list
1451    
1452        answer_init(&answer);
1453    
1454        exact = namedb_lookup(nsd->db, q->qname, &closest_match, &closest_encloser);
1455    
1456        answer_lookup_zone(nsd, q, &answer, 0, exact, closest_match,
1457            closest_encloser, q->qname);
1458        ZTATUP2(nsd, q->zone, opcode, q->opcode);
1459        ZTATUP2(nsd, q->zone, qtype, q->qtype);
1460        ZTATUP2(nsd, q->zone, qclass, q->qclass);
(gdb) print q->name
There is no member named name.
(gdb) print *q
$9 = {region = 0x0, addr = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, addrlen = 0, maxlen = 0, reserved_space = 0, edns = {
    status = EDNS_NOT_PRESENT, position = 0, maxlen = 0, opt_reserved_space = 0, dnssec_ok = 0, nsid = 0, cookie_status = COOKIE_NOT_PRESENT, cookie_len = 0, 
    cookie = '\000' <repeats 39 times>, ede = 0, ede_text = 0x0, ede_text_len = 0}, tsig = {status = TSIG_NOT_PRESENT, position = 0, response_count = 0, 
    updates_since_last_prepare = 0, context = 0x0, algorithm = 0x0, key = 0x0, prior_mac_size = 0, prior_mac_data = 0x0, rr_region = 0x0, context_region = 0x0, key_name = 0x0, 
    algorithm_name = 0x0, signed_time_high = 0, signed_time_low = 0, signed_time_fudge = 0, mac_size = 0, mac_data = 0x0, original_query_id = 0, error_code = 0, other_size = 0, 
    other_data = 0x0}, tsig_prepare_it = 0, tsig_update_it = 0, tsig_sign_it = 0, tcp = 0, tcplen = 0, packet = 0x0, qname = 0x0, qtype = 0, qclass = 0, zone = 0x0, 
  delegation_domain = 0x0, delegation_rrset = 0x0, opcode = 0 '\000', cname_count = 0, compressed_dname_count = 0, compressed_dnames = 0x0, compressed_dname_offsets = 0x0, 
  compressed_dname_offsets_size = 0, number_temporary_domains = 0, axfr_is_done = 0, axfr_zone = 0x0, axfr_current_domain = 0x0, axfr_current_rrset = 0x0, axfr_current_rr = 0, 
  wildcard_domain = 0x0}
(gdb) print *q->name
There is no member named name.
(gdb) print *q->qname
Cannot access memory at address 0x0
(gdb) print *answer
Structure has no component named operator*.
(gdb) print answer
value of type `answer_type' requires 204808 bytes, which is more than max-value-size
(gdb) print closest_encloser
$10 = (domain_type *) 0x0
(gdb) up
#7  0x000055ba5126986f in query_process (q=0x55ba651619e0, nsd=0x55ba51301340 <nsd>, now_p=0x7ffcc1f040cc) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1679
1679        answer_query(nsd, q);
(gdb) print *nsd
$11 = {region = 0x0, pid = 0, mode = 0, signal_hint_reload_hup = 0, signal_hint_reload = 0, signal_hint_child = 0, signal_hint_quit = 0, signal_hint_shutdown = 0, 
  signal_hint_stats = 0, signal_hint_statsusr = 0, quit_sync_done = 0, server_kind = 0, db = 0x0, debug = 0, child_count = 0, children = 0x0, restart_children = 0, 
  reload_failed = 0, this_child = 0x0, task = {0x0, 0x0}, mytask = 0, event_base = 0x0, server_region = 0x0, xfrd_listener = 0x0, rc = 0x0, dbfile = 0x0, pidfile = 0x0, 
  log_filename = 0x0, username = 0x0, uid = 0, gid = 0, chrootdir = 0x0, version = 0x0, identity = 0x0, nsid_len = 0, nsid = 0x0, file_rotation_ok = 0 '\000', 
  use_cpu_affinity = 0, cpuset = 0x0, xfrd_cpuset = 0x0, ifs = 0, reuseport = 0, tcp = 0x0, udp = 0x0, edns_ipv4 = {ok = "\000\000\000\000\000\000\000\000", 
    error = "\000\000\000\000\000\000\000\000", rdata_none = "\000", nsid = "\000\000\000", cookie = "\000\000\000"}, edns_ipv6 = {ok = "\000\000\000\000\000\000\000\000", 
    error = "\000\000\000\000\000\000\000\000", rdata_none = "\000", nsid = "\000\000\000", cookie = "\000\000\000"}, maximum_tcp_count = 0, current_tcp_count = 0, 
  tcp_query_count = 0, tcp_timeout = 0, tcp_mss = 0, outgoing_tcp_mss = 0, ipv4_edns_size = 0, ipv6_edns_size = 0, st = {boot = 0, period = 0, qtype = {0 <repeats 257 times>}, 
    qclass = {0, 0, 0, 0}, qudp = 0, qudp6 = 0, ctcp = 0, ctcp6 = 0, ctls = 0, ctls6 = 0, rcode = {0 <repeats 17 times>}, opcode = {0, 0, 0, 0, 0, 0}, dropped = 0, truncated = 0, 
    wrongzone = 0, txerr = 0, rxerr = 0, edns = 0, ednserr = 0, raxfr = 0, nona = 0, db_disk = 0, db_mem = 0}, zonestat = {0x0, 0x0}, zonestatfd = {0, 0}, zonestatfname = {0x0, 
    0x0}, zonestatsize = {0, 0}, zonestatdesired = 0, zonestatsizenow = 0, zonestatnow = 0x0, err_limit_time = 0, err_limit_count = 0, do_answer_cookie = 0, cookie_count = 0, 
  cookie_secrets = {{cookie_secret = '\000' <repeats 15 times>}, {cookie_secret = '\000' <repeats 15 times>}}, options = 0x0, tls_ctx = 0x0}
(gdb) up
#8  0x0000000000000000 in ?? ()
(gdb) up
Initial frame selected; you cannot go up.
(gdb) quit
[jared@puck coredump]# rpm -q nsd
nsd-4.3.7-2.fc34.x86_64
wcawijngaards commented 3 years ago

Something must be wrong. qname NULL is not possible, it should have returned a formerr. Also your debug shows that delegation_domain is NULL, and this is asserted the line above the delegation_rrset. So it should have failed one line above. These inconsistencies make me believe that there is a miscompile of some sort. Like the debug information does not match the executable. Or the executable is compiled with mismatched .o files, for example from different NSD versions; make clean and make it again.

jaredmauch commented 3 years ago

Yeah, let me make sure running version matches package installed version, but it should be, it's quite odd and started in the past hour.

wcawijngaards commented 3 years ago

I also checked for q->delegation_rrset NULL as condition, and from reading the code it should be impossible to have delegation_rrset NULL at that point, because in an earlier if statement the delegation_domain was checked before the routine is called, and is thus certain to be nonNULL, and if the delegation_domain is not NULL, then in the lookup routine for the delegation_rrset we are sure that the delegation_rrset is not NULL.

This trail is code is namedb.c:587 domain_find_ns_rrsets assigns to delegation_rrset and only then is delegation_domain returned, to query.c:1420 where the domain_find_ns_rrsets is called. Then a couple lines lower it checks if delegation_domain is NULL and then does not call answer_delegation() at all.

jaredmauch commented 3 years ago

Looks to be related to v6 RDNS zone?

(gdb) up
#1  0x00007fb6a44c88a4 in abort () from /lib64/libc.so.6
(gdb) up
#2  0x00007fb6a44c8789 in __assert_fail_base.cold () from /lib64/libc.so.6
(gdb) up
#3  0x00007fb6a44d7a16 in __assert_fail () from /lib64/libc.so.6
(gdb) up
#4  0x000055e2adb12b43 in answer_delegation (answer=0x7ffc9ab81d20, query=0x55e2c36010a0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:894
894     assert(query->delegation_rrset);
(gdb) up
#5  answer_lookup_zone (nsd=0x55e2adbad340 <nsd>, q=0x55e2c36010a0, answer=0x7ffc9ab81d20, domain_number=0, exact=<optimized out>, closest_match=0x55e2b1c86573, 
    closest_encloser=<optimized out>, qname=0x55e2c3601300) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1438
1438                answer_delegation(q, answer);
(gdb) print q
$1 = (struct query *) 0x55e2c36010a0
(gdb) print q->qname
$2 = (const dname_type *) 0x55e2c3601300
(gdb) print *q->qname
$3 = {name_size = 74 'J', label_count = 35 '#'}
(gdb) print*q->qname
$4 = {name_size = 74 'J', label_count = 35 '#'}
(gdb) print answer
$5 = (answer_type *) 0x7ffc9ab81d20
(gdb) print *answer
value of type `answer_type' requires 204808 bytes, which is more than max-value-size
(gdb) down
#4  0x000055e2adb12b43 in answer_delegation (answer=0x7ffc9ab81d20, query=0x55e2c36010a0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:894
894     assert(query->delegation_rrset);
(gdb) up
#5  answer_lookup_zone (nsd=0x55e2adbad340 <nsd>, q=0x55e2c36010a0, answer=0x7ffc9ab81d20, domain_number=0, exact=<optimized out>, closest_match=0x55e2b1c86573, 
    closest_encloser=<optimized out>, qname=0x55e2c3601300) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1438
1438                answer_delegation(q, answer);
(gdb) up
#6  0x000055e2adb12cbc in answer_query (nsd=nsd@entry=0x55e2adbad340 <nsd>, q=q@entry=0x55e2c36010a0) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1456
1456        answer_lookup_zone(nsd, q, &answer, 0, exact, closest_match,
(gdb) print q
$6 = (struct query *) 0x55e2c36010a0
(gdb) print *q
$7 = {region = 0x55e2c158b410, addr = {ss_family = 2, __ss_padding = "\242\346\003\357\233\354", '\000' <repeats 111 times>, __ss_align = 0}, addrlen = 16, maxlen = 1232, 
  reserved_space = 11, edns = {status = EDNS_OK, position = 90, maxlen = 1232, opt_reserved_space = 0, dnssec_ok = 32768, nsid = 0, cookie_status = COOKIE_NOT_PRESENT, 
    cookie_len = 0, cookie = '\000' <repeats 39 times>, ede = -1, ede_text = 0x0, ede_text_len = 0}, tsig = {status = TSIG_NOT_PRESENT, position = 0, response_count = 0, 
    updates_since_last_prepare = 0, context = 0x0, algorithm = 0x0, key = 0x0, prior_mac_size = 0, prior_mac_data = 0x0, rr_region = 0x55e2c1679e30, 
    context_region = 0x55e2c17b7d00, key_name = 0x0, algorithm_name = 0x0, signed_time_high = 0, signed_time_low = 0, signed_time_fudge = 0, mac_size = 0, mac_data = 0x0, 
    original_query_id = 0, error_code = 0, other_size = 0, other_data = 0x0}, tsig_prepare_it = 1, tsig_update_it = 1, tsig_sign_it = 1, tcp = 0, tcplen = 0, 
  packet = 0x55e2c293cdf8, qname = 0x55e2c3601300, qtype = 12, qclass = 1, zone = 0x55e2b1c78806, delegation_domain = 0x55e2b1c86573, delegation_rrset = 0x0, opcode = 0 '\000', 
  cname_count = 0, compressed_dname_count = 0, compressed_dnames = 0x55e2adbaf040 <compressed_dnames.lto_priv>, compressed_dname_offsets = 0x55e2c29a8ad0, 
  compressed_dname_offsets_size = 415137, number_temporary_domains = 0, axfr_is_done = 0, axfr_zone = 0x0, axfr_current_domain = 0x0, axfr_current_rrset = 0x0, 
  axfr_current_rr = 0, wildcard_domain = 0x0}
(gdb) print *q->qname
$8 = {name_size = 74 'J', label_count = 35 '#'}
(gdb) print *q->packet
$9 = {_position = 90, _limit = 131337, _capacity = 131337, _data = 0x55e2c3605310 "s4\200", _fixed = 0}
(gdb) print *q->packet._data
$10 = 115 's'
(gdb) print q->packet._data
$11 = (uint8_t *) 0x55e2c3605310 "s4\200"
(gdb) print q->packet
$12 = (buffer_type *) 0x55e2c293cdf8
(gdb) print *q->packet
$13 = {_position = 90, _limit = 131337, _capacity = 131337, _data = 0x55e2c3605310 "s4\200", _fixed = 0}
(gdb) up
#7  0x000055e2adb1586f in query_process (q=0x55e2c36010a0, nsd=0x55e2adbad340 <nsd>, now_p=0x7ffc9abb3f7c) at /usr/src/debug/nsd-4.3.7-2.fc34.x86_64/query.c:1679
1679        answer_query(nsd, q);
(gdb) print q
$14 = (query_type *) 0x55e2c36010a0
(gdb) print *q
$15 = {region = 0x55e2c158b410, addr = {ss_family = 2, __ss_padding = "\242\346\003\357\233\354", '\000' <repeats 111 times>, __ss_align = 0}, addrlen = 16, maxlen = 1232, 
  reserved_space = 11, edns = {status = EDNS_OK, position = 90, maxlen = 1232, opt_reserved_space = 0, dnssec_ok = 32768, nsid = 0, cookie_status = COOKIE_NOT_PRESENT, 
    cookie_len = 0, cookie = '\000' <repeats 39 times>, ede = -1, ede_text = 0x0, ede_text_len = 0}, tsig = {status = TSIG_NOT_PRESENT, position = 0, response_count = 0, 
    updates_since_last_prepare = 0, context = 0x0, algorithm = 0x0, key = 0x0, prior_mac_size = 0, prior_mac_data = 0x0, rr_region = 0x55e2c1679e30, 
    context_region = 0x55e2c17b7d00, key_name = 0x0, algorithm_name = 0x0, signed_time_high = 0, signed_time_low = 0, signed_time_fudge = 0, mac_size = 0, mac_data = 0x0, 
    original_query_id = 0, error_code = 0, other_size = 0, other_data = 0x0}, tsig_prepare_it = 1, tsig_update_it = 1, tsig_sign_it = 1, tcp = 0, tcplen = 0, 
  packet = 0x55e2c293cdf8, qname = 0x55e2c3601300, qtype = 12, qclass = 1, zone = 0x55e2b1c78806, delegation_domain = 0x55e2b1c86573, delegation_rrset = 0x0, opcode = 0 '\000', 
  cname_count = 0, compressed_dname_count = 0, compressed_dnames = 0x55e2adbaf040 <compressed_dnames.lto_priv>, compressed_dname_offsets = 0x55e2c29a8ad0, 
  compressed_dname_offsets_size = 415137, number_temporary_domains = 0, axfr_is_done = 0, axfr_zone = 0x0, axfr_current_domain = 0x0, axfr_current_rrset = 0x0, 
  axfr_current_rr = 0, wildcard_domain = 0x0}
(gdb) print *q->zone
$16 = {node = 0x55e2b1c7888b, apex = 0x55e2b1c78afe, soa_rrset = 0x55e2b1c78c06, soa_nx_rrset = 0x55e2b1c78c3a, ns_rrset = 0x55e2b1c78c76, nsec3_param = 0x0, nsec3_last = 0x0, 
  nsec3tree = 0x0, hashtree = 0x0, wchashtree = 0x0, dshashtree = 0x0, opts = 0x55e2afbfa85b, filename = 0x55e2b1c05922 "/etc/nsd/secondaries/0.8.c.7.7.0.6.2.ip6.arpa", 
  logstr = 0x0, mtime = {tv_sec = 1615943727, tv_nsec = 346618416}, zonestatid = 729, is_secure = 0, is_ok = 1, is_changed = 0}
(gdb) print *q->zone->node
$17 = {elem = 0x55e2b1c78806, parent = 0x55e2b1c86bc2, pidx = 0 '\000', offset = 0 '\000', len = 0, capacity = 0, array = 0x0}
(gdb) print *q->zone->node->elem
Attempt to dereference a generic pointer.
(gdb) print *q->zone->node->parent
$18 = {elem = 0x0, parent = 0x55e2b1c4bf84, pidx = 6 '\006', offset = 56 '8', len = 47, capacity = 47, array = 0x55e2b1c6ed40}
wcawijngaards commented 3 years ago

So this trace has qname and also delegation_domain not null. It is a query of type PTR.

It shows that delegation_rrset is NULL. But from the code that looks impossible. namedb.c:587, there the rrset is assigned nonNULL, and only then, if that is nonNULL, the result is notNULL and that is the delegation_domain. Later processing can make delegation_domain NULL. So it is not possible, from the code read to have delegation_domain != NULL and delegation_rrset == NULL. And this is what your debug trace shows.

If you had a manual compile I would suggest lowering optimisation options in the compiler. Like, not use -flto and drop -O. But you have a package. flto can be disabled with the configure switch --disable-flto and the -O2 option by passing CFLAGS="-g" to configure. I have heard of compilers that fail with high -O settings and failures (at compile time) for -flto, but have not heard of this kind of trouble, but I have no other idea how the code can make domain && !rrset happen here.

I could add a check in line query.c:1426 to check for delegation_rrset, but it is already guaranteed to be nonNULL by the previous check, really.

index 44b6690a..d0fba064 100644
--- a/query.c
+++ b/query.c
@@ -1424,6 +1424,7 @@ answer_lookup_zone(struct nsd *nsd, struct query *q, answer_type *answer,
                }

                if (!q->delegation_domain
+                   || !q->delegation_rrset
                    || (exact && q->qtype == TYPE_DS && closest_encloser == q->delegation_domain))
                {
                        if (q->qclass == CLASS_ANY) {
jaredmauch commented 3 years ago

some feedback:

[2021-08-21 00:51:29.888] nsd[2188979]: notice: setup TCP for TLS service on interface 204.42.254.5@853
[2021-08-21 00:51:29.889] nsd[2188979]: notice: setup TCP for TLS service on interface 2001:418:3f4::5@853
nsd: query.c:894: answer_delegation: Assertion `query->delegation_rrset' failed.
[2021-08-21 00:51:31.358] nsd[2188873]: warning: server 2188942 died unexpectedly, restarting
[2021-08-21 00:51:31.366] nsd[2188873]: warning: process 2188942 terminated with status 134
[2021-08-21 00:51:31.383] nsd[2188990]: notice: setup TCP for TLS service on interface 204.42.254.5@853
[2021-08-21 00:51:31.384] nsd[2188990]: notice: setup TCP for TLS service on interface 2001:418:3f4::5@853
nsd: query.c:894: answer_delegation: Assertion `query->delegation_rrset' failed.
[2021-08-21 00:51:37.438] nsd[2188873]: warning: server 2188979 died unexpectedly, restarting

I went and pulled from git and applied your change to query.c as well and I no longer see the issue when the query comes inbound, perhaps via TCP/TLS as noted above.

After:

[2021-08-21 00:57:29.961] nsd[2190098]: info: service remaining TCP connections[2021-08-21 00:57:29.961] nsd[2190099]: info: NSTATS 1629521849 1629521829 A=2452 NS=241 CNAME=8 SOA=21 PTR=267 MX=185 TXT=55 AAAA=1162 SRV=279 TYPE38=1 DS=1 DNSKEY=77 TLSA=4 HTTPS=23 SPF=1 TYPE255=2

[2021-08-21 00:57:29.961] nsd[2190099]: info: XSTATS 1629521849 1629521829 RR=0 RNXD=0 RFwdR=0 RDupR=0 RFail=0 RFErr=0 RErr=0 RAXFR=0 RLame=0 ROpts=0 SSysQ=0 SAns=4677 SFwdQ=0 SDupQ=0 SErr=0 RQ=4780 RIQ=0 RFwdQ=0 RDupQ=0 RTCP=104 SFwdR=0 SFail=219 SFErr=0 SNaAns=29 SNXD=373 RUQ=0 RURQ=0 RUXFR=0 RUUpd=0
[2021-08-21 00:57:29.964] nsd[2190154]: notice: setup TCP for TLS service on interface 204.42.254.5@853
[2021-08-21 00:57:29.964] nsd[2190154]: notice: setup TCP for TLS service on interface 2001:418:3f4::5@853
[2021-08-21 00:57:29.968] nsd[2190155]: notice: setup TCP for TLS service on interface 204.42.254.5@853
[2021-08-21 00:57:29.969] nsd[2190155]: notice: setup TCP for TLS service on interface 2001:418:3f4::5@853
[2021-08-21 00:57:30.063] nsd[2190098]: info: NSTATS 1629521850 1629521829 A=2566 NS=213 CNAME=8 SOA=19 PTR=278 MX=179 TXT=52 AAAA=1229 SRV=282 TYPE38=3 DS=8 DNSKEY=79 TLSA=7 HTTPS=17 TYPE255=2
[2021-08-21 00:57:30.063] nsd[2190098]: info: XSTATS 1629521850 1629521829 RR=0 RNXD=0 RFwdR=0 RDupR=0 RFail=0 RFErr=0 RErr=0 RAXFR=0 RLame=0 ROpts=0 SSysQ=0 SAns=4858 SFwdQ=0 SDupQ=0 SErr=0 RQ=4942 RIQ=0 RFwdQ=0 RDupQ=0 RTCP=84 SFwdR=0 SFail=196 SFErr=0 SNaAns=27 SNXD=437 RUQ=0 RURQ=0 RUXFR=0 RUUpd=0
wcawijngaards commented 3 years ago

Okay I have committed the fix, because it stops the problem from happening. Still think something must be wrong for this to happen, but the fix stops the crash so added that to the code repository.

fobser commented 3 years ago

712296f only hides the problem, it doesn't fix anything. The real fix is ba0002e

See #195 for an explanation why I think that is.