Open Habbie opened 6 years ago
figuring out why ^C 'hangs on shutdown' is only possible with --debug
△ src/PowerDNS/pdns ./pdns/ixfrdist --config pdns/ixfrdist.yml
[NOTICE] IXFR distributor version 0.0.ixfrdistfixes.g8f0ca614a.dirty starting up!
[NOTICE] ACL set to 127.0.0.0/8, ::1/128.
[NOTICE] Update Thread started
[WARNING] Unable to get SOA serial update for 'example.com' from master 127.0.0.1: Reading from a socket: Connection refused
^C[NOTICE] Got Interrupt signal, stopping
[NOTICE] Shutting down!
[NOTICE] UpdateThread stopped
[NOTICE] IXFR distributor stopped
Should this report "stopping all threads"?
Should this report "stopping all threads"?
I don't follow, the best response I have is 'perhaps the ^C response can mention what threads are supposed to die so the user can at least see which threads have not in fact died yet'.
There is no signalling between threads, other than a globar var telling them to stop. I can elaborate on the message telling the user "will quit when current transfers are complete", additionally, I could ensure that a second ^C will kill regardless
I could ensure that a second ^C will kill regardless
This appears to work already, to be clear!
[ERROR] Unable to read 'listen' value: yaml-cpp: error at line 0, column 0: bad conversion
when ':53' (without IP) is in the listen
array. Presumably the line/column could be better.~ somehow I get a nice/line number now[ERROR] Unable to read 'listen' value: yaml-cpp: error at line 0, column 0: bad conversion when ':53' (without IP) is in the listen array. Presumably the line/column could be better.
this is yaml-cpp not knowing that info :(
12345.partial
files may stick around (after a kill -9 from the kernel OOM)ports in domains->masters do not default to 53 (and I suspect, neither to ports on listen addresses)
I do confirm your suspicion :)
dig ixfr=1
, I only get updated to version 2, not all the way to 5if ixfrdist has versions 1,2,3,4,5 of a zone, and I dig ixfr=1, I only get updated to version 2, not all the way to 5
Did you check this with anything else than dig
? Because I have noticed that dig
doesn't seem to handle non-condensed IXFR
responses spanning multiple DNS
messages.
Did you check this with anything else than
dig
? Because I have noticed thatdig
doesn't seem to handle non-condensedIXFR
responses spanning multipleDNS
messages.
It turns out ixfrdist does send a full set of deltas, it just does it wrong!
If ixfrdist has versions 1535301301 1535301901 1535302501, the ixfr=1535301301 response roughly looks as follows:
SOA 1535301901
SOA 1535301301
SOA 1535301901
SOA 1535301901
SOA 1535302501
SOA 1535301901
[removed record here]
SOA 1535302501
SOA 1535302501
All IXFR consumers I can find (ixplore, dig, dnspython) will stop after the fourth SOA, because it matches the first SOA. The first SOA in the response should have the highest serial available.
Here is a crude hack for sdig to debug this with: https://gist.github.com/Habbie/5d0556b75933bc06c1893519831f2fae
Looking at the output again, the first SOA is not all that is wrong. The sequence of numbers for a non-condensed update from A to C via B should be C A B B C C, while I got B A B B C B C C - looks like too much SOA wrapping is happening.
example.com.
, ixfrdist instead receives an AXFR response for example.
, it will crash with Attempt to print an unset dnsname
:~ fixed in #7011* thread #2, stop reason = breakpoint 1.1
* frame #0: 0x00007fffcc26e745 libc++abi.dylib`__cxa_throw
frame #1: 0x000000010004a995 ixfrdist`DNSName::toString(this=0x0000000100b046e0, separator=".", trailing=false) const at dnsname.cc:165
frame #2: 0x000000010018d54f ixfrdist`DNSName::toStringNoDot(this=0x0000000100b046e0) const at dnsname.hh:72
frame #3: 0x000000010018c3d5 ixfrdist`writeRecords(fp=0x00007fffd65630b0, records=0x0000700008e91098)>, boost::multi_index::member<DNSRecord, unsigned short, &(DNSRecord::d_type)>, boost::multi_index::member<DNSRecord, unsigned short, &(DNSRecord::d_class)>, boost::multi_index::member<DNSRecord, std::__1::shared_ptr<DNSRecordContent>, &(DNSRecord::d_content)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, boost::multi_index::composite_key_compare<CanonDNSNameCompare, std::__1::less<unsigned short>, std::__1::less<unsigned short>, CIContentCompareStruct, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::__1::allocator<DNSRecord> > const&) at ixfrutils.cc:112
frame #4: 0x000000010018bf5f ixfrdist`writeZoneToDisk(records=0x0000700008e91f40, zone=0x0000700008e92188, directory="/Users/peter/projects/powerdns/pdns/regression-tests.ixfrdist/example.com.")>, boost::multi_index::member<DNSRecord, unsigned short, &(DNSRecord::d_type)>, boost::multi_index::member<DNSRecord, unsigned short, &(DNSRecord::d_class)>, boost::multi_index::member<DNSRecord, std::__1::shared_ptr<DNSRecordContent>, &(DNSRecord::d_content)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, boost::multi_index::composite_key_compare<CanonDNSNameCompare, std::__1::less<unsigned short>, std::__1::less<unsigned short>, CIContentCompareStruct, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::__1::allocator<DNSRecord> > const&, DNSName const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) at ixfrutils.cc:132
frame #5: 0x00000001001394b3 ixfrdist`updateThread(workdir="/Users/peter/projects/powerdns/pdns/regression-tests.ixfrdist", keep=0x0000000100a05018, axfrTimeout=0x0000000100a0501a) at ixfrdist.cc:372
frame #6: 0x00000001001861b2 ixfrdist`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short> >(void*) [inlined] decltype(__f=0x0000000100a04ff8, __args="/Users/peter/projects/powerdns/pdns/regression-tests.ixfrdist", __args=0x0000000100a05018, __args=0x0000000100a0501a)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&)>(fp)(std::__1::forward<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short>(fp0))) std::__1::__invoke<void (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short>(void (*&&)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, unsigned short&&, unsigned short&&) at type_traits:4291
frame #7: 0x0000000100186177 ixfrdist`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short> >(void*) [inlined] void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short, 2ul, 3ul, 4ul>(__t=0x0000000100a04ff0)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short>&, std::__1::__tuple_indices<2ul, 3ul, 4ul>) at thread:336
frame #8: 0x00000001001860c9 ixfrdist`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short const&, unsigned short const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned short, unsigned short> >(__vp=0x0000000100a04ff0) at thread:346
frame #9: 0x00007fffcd87d93b libsystem_pthread.dylib`_pthread_body + 180
frame #10: 0x00007fffcd87d887 libsystem_pthread.dylib`_pthread_start + 286
frame #11: 0x00007fffcd87d08d libsystem_pthread.dylib`thread_start + 13
man_pages.sort()
to conf.py does not appear to fix it (even if inspecting man_pages
suggests it should)ixfrdist.yml is all the way at the bottom on https://doc.powerdns.com/authoritative/manpages/index.html and adding man_pages.sort() to conf.py does not appear to fix it (even if inspecting man_pages suggests it should)
manpages are sorted by section and name
std::sort(zoneVersions.begin(), zoneVersions.end(), sortSOA);
looks like it could get very confused, because SOA serials do not have a strict ordering
Updating this while dogfooding. We can split into multiple tickets later.
getSerialFromMaster()
) does not have a timeout~ looks fixed in #6885~Typo:~ fixed in #6885