buffrr / hsd-axfr

HSD plugin that implements DNS zone transfer protocol (AXFR)
10 stars 2 forks source link

Got this error when `hsd` gets an AXFR request from ISC's `bind` #1

Closed james-stevens closed 2 years ago

james-stevens commented 2 years ago
[info] (net) Received 16 addrs (hosts=205, peers=8) (40.113.229.250:12038).
[error] (ns) Out of bounds read (offset=32).
    at BufferReader.readU32BE (/opt/hsd-orig/node_modules/bufio/lib/reader.js:256:10)
    at EXPIREOption.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5825:22)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readOption (/opt/hsd-orig/node_modules/bns/lib/wire.js:6620:27)
    at Option.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5441:19)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at OPTRecord.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:3788:32)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readData (/opt/hsd-orig/node_modules/bns/lib/wire.js:6513:24)
    at Record.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:1885:17)
[info] (axfr) [127.0.0.1:47597]  Starting zone transfer

My guess would be bind is sending some extra records at the end of the AXFR request, like an EDNS0 or DNS Cookie or something. The transfer then seems to start fine - but its ducking slow as heck, so still waiting for it to complete.

BTW: would be REALLY nice if the SOA Serial reflected the version of the data, instead of just giving the current date & time - that way I could run two copies of hsd and pull the zone file from either & if one had been down and was still catching up, bind wouldn't end up pulling stale data from it.

If you are pulling the AXFR from the frozen data, instead of the live data (this is the preferred choice but sorry, I don't know your terminology for these two data sets), the SOA Serial should reflect the "version" of the frozen data, e.g. the unix timestamp of the last packet that got included or something.

james-stevens commented 2 years ago

BTW: dig does not trigger the above error

[info] (axfr) [127.0.0.1:47597] Records sent 4428024
[warning] (ns) Root server middleware resolution failed for name: .
[info] (chain) Block 000000000000000329a223f34a167e7716b8c434d39208b7d58191501e07d1e3 (90310) added to chain (size=345 txs=1 time=28.259

Looks like the transfer failed - the TCP connection between bind & hsd is still open, but nothing seems to be happening & bind does not respond to a query. Left it for a while, now killed it.

I'll try again with dig - I'll also try setting up a local slave of the ICANN ROOT and specify it with --axfr-icann-servers= (I assume this will take IP Addresses)

...

dig completed & hsd issued the expected collision warnings, but there are NO ICANN domains in the ROOT zone provided. dig output ends

.           86400   IN  SOA . . 2021101916 1800 900 604800 21600
; Transfer failed.

When I specify --axfr-icann-servers='192.33.4.12 199.9.14.201 192.5.5.241' (b,c,f) I still get this from hsd at the end of the transfer

[warning] (ns) Root server middleware resolution failed for name: .

NOTE: I was running with options --ns-host 127.0.0.9 --ns-port 53 - will try again with no port or IP options, as per instructions, but I would REALLY prefer to be able to use them.

james-stevens commented 2 years ago

Run with ./bin/hsd --prefix=/opt/hsd-data/ --plugins=$(pwd)/axfr --axfr-icann-servers='192.33.4.12 199.9.14.201 192.5.5.241'

Now I can see debug messages AXFR ends

[warning] (ns) Root server middleware resolution failed for name: .
[debug] (ns) Error: Unknown serialization version: 2.
    at Resource.read (/opt/hsd-orig/lib/dns/resource.js:117:13)
    at Resource.decode (/opt/hsd-orig/node_modules/bufio/lib/struct.js:94:10)
    at Function.decode (/opt/hsd-orig/node_modules/bufio/lib/struct.js:147:23)
    at Plugin.sendAXFR (/opt/hsd-orig/axfr/lib/axfr.js:162:33)
    at async RootServer.Plugin.ns.middle (/opt/hsd-orig/axfr/lib/axfr.js:74:16)
    at async RootServer.resolve (/opt/hsd-orig/lib/dns/server.js:486:15)
    at async RootServer.answer (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:249:17)
    at async RootServer.handle (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:316:13)
    at async Server.<anonymous> (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:72:9)
[debug] (net) Requesting 1/1 txs from peer with getdata (159.69.46.23:12038).

Interestingly, this serialization error was actually what stopped my zone-dump from working.

    if (version !== 0)
      throw new Error(`Unknown serialization version: ${version}.`);

Not exactly sure why an error is only showing up in debug

Now the AXFR data ends

fastestshipper.         21600   IN      NS      ns1.fastestshipper.
fastestshipper.         21600   IN      NS      ns2.fastestshipper.
.                       86400   IN      SOA     . . 2021101916 1800 900 604800 21600
.                       0       ANY     SIG     0 253 0 0 20211019223904 20211019103904 41555 . 3ekJC/2vq261JARHDxHMQn4I88JsivgweN/K4rLzvZZowyDw1GBWqxA/ V74iHPORosQik6TcbwgGPhOBUblPLg==
; Transfer failed.

Still no ICANN domains in the data

buffrr commented 2 years ago

My guess would be bind is sending some extra records at the end of the AXFR request, like an EDNS0 or DNS Cookie or something.

Ugh i'm not a big fan of bns you're probably right will look into this but the last time I tested with bind it worked fine.

I was able to reproduce this issue. I wrapped Resource.decode in a try/catch block and the name roivantsciences is what's causing the serialization error.

Trying to dig this name dig @127.0.0.1 -p 5349 roivantsciences i can see the serialization error/servfail. I will submit a fix for this but i was able to export it after catching the error:

root zone is much larger now :) it took approx 6 minutes to export.

$ tail root.zone
ns1zim.telone.co.zw.    172800  IN      AAAA    2c0f:f758:0:a::81
ns1zim.telone.co.zw.    172800  IN      A       41.220.30.81
ns2zim.telone.co.zw.    172800  IN      AAAA    2c0f:f758:0:a::82
ns2zim.telone.co.zw.    172800  IN      A       41.220.30.82
.                       86400   IN      SOA     . . 2021101923 1800 900 604800 21600
;; Query time: 351320 msec
;; SERVER: 127.0.0.1#5349(127.0.0.1)
;; WHEN: Tue Oct 19 16:06:00 MST 2021
;; XFR size: 4615594 records (messages 2396, bytes 132724457)

BTW: would be REALLY nice if the SOA Serial reflected the version of the data, instead of just giving the current date & time - that way I could run two copies of hsd and pull the zone file from either & if one had been down and was still catching up, bind wouldn't end up pulling stale data from it.

Yup, I'd like to add this. Also, would be nice if the user can specify the NS records/glues serving the root zone (if u r running public resolver). For example, I have the name root-servers/ and would like to add . NS a.root-servers, . NS b.root-servers ... etc to the exported zone (obv depends on the number of root servers u r running). Instead of . NS . or synth records so things like dig . +trace works. SOA should have correct primary server too.

If you are pulling the AXFR from the frozen data, instead of the live data (this is the preferred choice but sorry, I don't know your terminology for these two data sets), the SOA Serial should reflect the "version" of the frozen data, e.g. the unix timestamp of the last packet that got included or something.

I believe it's using the "frozen" data but unfortunately there's a chance it could change during chain re-organization. The older the data the less likely it's affected by a reorg so we may want to wait a few blocks to make the serial more reliable.

NOTE: I was running with options --ns-host 127.0.0.9 --ns-port 53 - will try again with no port or IP options, as per instructions, but I would REALLY prefer to be able to use them.

you should be able to use this option.

Still no ICANN domains in the data

Any names with collisions should be replaced during transfer but remaining ICANN TLDs like .com,.net ... etc are written at the end but because the transfer is failing you're not seeing them yet.

BTW, you should use HSD v3 it has --no-sig0 option. I'm currently using it this way which disables recursive resolver, wallet plugin and sig0. Runs the root server only:

./bin/hsd --no-wallet --no-rs --no-sig0 --plugins=$(pwd)/hsd-axfr 
james-stevens commented 2 years ago

Thanks for such a fast response - I really appreciate it

Sorry it's taken me so long to look at this, I was moving the registry system onto a new server - you know I run the backend registry system for namebase, right?

Trying to dig this name dig @127.0.0.1 -p 5349 roivantsciences i can see the serialization error/servfail

Nice one !

TBH: I am not a fan of using either exceptions or abort cos, in a live environment, you really want errors handled in a more sophisticated way than just crashing. Interestingly golang does away with exceptions.

i was able to export it after catching the error

I commented those two lines out & dig on the name now just returns a NODATA response, still waiting for the outcome of the AXFR :)

[warning] (ns) Root server middleware resolution failed for name: .

Any idea if this means anything?

I believe it's using the "frozen" data but unfortunately there's a chance it could change during chain re-organization.

Right, but bind would normally do an AXFR shortly after the SOA Serial changes, so (if it changes only when the stored data changes) you get ~6 hrs to get it done, which is fine now, but could be a problem in the future - not kidding.

ICANN's standard testing is to 500M names - they have tested their ROOT infrastructure to this level - clearly nobody thought about that here.

It's such a shame, cos the code is really pretty decent, but the use cases envisaged are so narrow. Even the database design itself isn't really fit for more than a few 1000 names, esp if you want to do AXFR.

Devs are all always full of "its works on my laptop", then poor SysEng have to scale it to 10M users & still get some sleep at night.

root zone is much larger now :) it took approx 6 minutes to export

yeah, at least - and I'm on an i7 extreme - its an older one (32nm with 22nm chipset), but still

You can't properly sign the zone becuase of the database design & it takes ages to AXFR the zone, IXFR is out of the question - and the zone is only going to get bigger. IMHO the design is not really fit for purpose.

BTW, you should use HSD v3 it has --no-sig0 option

I think I am using v3, cos I cloned the latest code but, as I was running into problems, I want to use exactly the same cmdline options you had in the documentation.

TBH more than slightly shocked that 2 years after I was insisting no-sig0 was necessary, & everybody was arguing against me, it's actually made it into the main code base.

Patching it in myself was OK, but it's not really practical in the long term - if a project won't accept a change it's super pain in the butt to keep it going. Hence when my ROOT dump & sign broke I just gave up on it, until I had time to look at your AXFR.

Also, would be nice if the user can specify the NS records/glues serving the root zone

Kind of - personally I slave the merged (& signed) ROOT zone into every resolver, so the NS are irrelevant.

ICANN have actually discovered that running ROOT servers has now hit a bottleneck & they're looking to move over to slaving the ROOT zone into (all) resolvers, with tiers of XFR servers, like the NTP infrastructure. Don't know how far that discussion has gone. They're kinda government, so will no doubt it will take years - clearly putting it in a blockchain is a FAR better solution - maybe they'll work that out one day.

The ICANN ROOT zone changes by a max of about 5 records per day - compared to about 6T of traffic in queries, per ROOT NS - slaving is kind of a no-brainer, really.

What would also be nice would be to be able to add in NS records for eth if you're running servers that can serve that data, although they'd have to be authoritative, i.e. set the AA bit, not just be resolvers, or a bind resolver wouldn't accept their answers.

SOA should have correct primary server too

True, but this is generally documentation only, really - some checking s/w uses it to get the "correct" SOA Serial to check against the other NS, but other than that it's just text, really.

james-stevens commented 2 years ago

Yeah, looks like just commenting out those two lines means the AXFR now works :+1: :1st_place_medal:

    /*
    if (version !== 0)
      throw new Error(`Unknown serialization version: ${version}.`);
    */

No data for roivantsciences in the XFR file, so same result as for the dig, really

Probably ought to workout what to do with buggers like this as well .. :wink:

forever.                21600   IN      NS      0x0001af047E9fb5dCD99E6823C900f3D8f5b2c5f4._eth.
badass.                 21600   IN      NS      0x36fc69f0983E536D1787cC83f481581f22CCA2A1._eth.
forhlre.                21600   IN      NS      0x0._eth.

Although that third one looks like garbage to me :laughing:

Shame SLDs can't live in the Handshake blockchain - or at least it's a shame you have to use a 3rd party blockchain to get this functionality.

BTW: the Out of bounds read at the start doesn't seem to be a problem, just I never like that sort of thing

...

Working fine with bind now - for comparison AXFR from bind takes 11 seconds on the same machine :wink: Its coming from RAM, so its also CPU bound

Looks like this happens every time bind polls the SOA Serial - I doubt its a problem, cos the AXFR has already worked once, but I'll keep an eye

[error] (ns) Out of bounds read (offset=32).
    at BufferReader.readU32BE (/opt/hsd-orig/node_modules/bufio/lib/reader.js:256:10)
    at EXPIREOption.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5825:22)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readOption (/opt/hsd-orig/node_modules/bns/lib/wire.js:6620:27)
    at Option.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5441:19)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at OPTRecord.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:3788:32)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readData (/opt/hsd-orig/node_modules/bns/lib/wire.js:6513:24)
    at Record.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:1885:17)

I suspect its caused by bind using DNS Cookies. If that is the case, it's possible I can configure it away, but really hsd should support DNS Cookies, esp if people are going to use it as a resolver.

james-stevens commented 2 years ago

So ... I run bind as a local slave of the ICANN ROOT zone & it listens on 127.1.0.1 but hsd will not get the ROOT zone from it, with no apparent reason, they connect & then terminate the connection. It also listens on 192.168.1.161 & that doesn't work either. BIND 9.16.20

hsd than just aborts with [debug] (ns) Error: no zone data - what else can it do

If I remove --axfr-icann-servers='127.1.0.1', then it works fine - kinda odd cos I'm pretty sure ROOT B & C both run bind anyway. When I dig from the local copy & B the data given back is identical. I've been using the local copy for ages for my python merge script.

I'll have to remove use of the local copy for now, but the advantages of using it are obvious.

from hsd

[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly
[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly
[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly
[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly
[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly
[warning] (axfr) Attempt [127.1.0.1:53] failed: Error: Closed unexpectedly

from bind (log level = debug - which adds pretty much nothing !!)

Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db7821a0 127.0.0.1#49034 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db7821a0 127.0.0.1#49034 (.): transfer of './IN': send: operation canceled
Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db782200 127.0.0.1#49036 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db782200 127.0.0.1#49036 (.): transfer of './IN': send: operation canceled
Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db782260 127.0.0.1#49038 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db782260 127.0.0.1#49038 (.): transfer of './IN': send: operation canceled
Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db7822c0 127.0.0.1#49040 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db7822c0 127.0.0.1#49040 (.): transfer of './IN': send: operation canceled
Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db782320 127.0.0.1#49042 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db782320 127.0.0.1#49042 (.): transfer of './IN': send: operation canceled
Oct 20 15:12:33 hasroot local0.info named-icann-slave[7828]: client @0x7f07db782380 127.0.0.1#49044 (.): transfer of './IN': AXFR started (serial 2021102000)
Oct 20 15:12:33 hasroot local0.err named-icann-slave[7828]: client @0x7f07db782380 127.0.0.1#49044 (.): transfer of './IN': send: operation canceled

This seems to imply that hsd thinks bind aborted, but its def far from conclusive.

adding these bind config options makes no difference

    answer-cookie no;
    send-cookie no;
james-stevens commented 2 years ago

I have to say, this highlights one of the things about the Handshake project that has always given me serious cause for concern from day-1

And that is what appears to be a complete lack of care of compatibility with existing DNS standards

You simple CAN'T expect to replace the ROOT zone without interoperability with existing DNS - people simply aren't going to thrown out all existing DNS s/w for something that is not even compatible with existing standards - and you are risking destabilizing something that is unbelievably stable & robust & underpins everything that happens on the internet.

A major disruption only needs to happen once for people to use that as an excuse to get rid of it.

Everything is written down in RFCs - it's just not that hard - it is harder than some other standards becuase some of the older RFCs just aren't what well written becuase they are really old, but mostly they're had "clarifications"

I know it's not your fault, but it's really really not good

buffrr commented 2 years ago

I was moving the registry system onto a new server - you know I run the backend registry system for namebase, right?

Oh nice! they really should fix hdns.io their dnssec is broken. Maybe you can help them with that ;)

[warning] (ns) Root server middleware resolution failed for name: .

Any idea if this means anything?

If the transfer fails for some reason, the plugin will not answer the query for . AXFR so you get that error.

What would also be nice would be to be able to add in NS records for eth if you're running servers that can serve that data, although they'd have to be authoritative, i.e. set the AA bit, not just be resolvers, or a bind resolver wouldn't accept their answers.

yeah those could be replaced with NS & DS records. The auth server would have to sign the records coming from Ethereum. The nameserver should have access to the handshake root zone so it can get back the contract addresses to fetch the records.

Shame SLDs can't live in the Handshake blockchain - or at least it's a shame you have to use a 3rd party blockchain to get this functionality.

yeah Ethereum is also a general purpose blockchain so it's HUGE!

I suspect its caused by bind using DNS Cookies. If that is the case, it's possible I can configure it away, but really hsd should support DNS Cookies, esp if people are going to use it as a resolver.

bns needs some clean up to properly handle edns0 things. Regarding the cookies, hsd root server should set behind a recursive resolver anyway and not be exposed publicly.

So ... I run bind as a local slave of the ICANN ROOT zone & it listens on 127.1.0.1 but hsd will not get the ROOT zone from it, with no apparent reason, they connect & then terminate the connection

I see ... the plugin was sending a TCP FIN after it's done writing the query. Technically bind should still write back the answer but it assumes the client cancelled. This is a good indication that root servers may not be running bind ;) but I could be wrong.

Can you apply this patch to the plugin to see if this resolves serialization/transfer issues?

diff --git a/lib/axfr.js b/lib/axfr.js
index b08663b..de6733f 100644
--- a/lib/axfr.js
+++ b/lib/axfr.js
@@ -159,7 +159,14 @@ class Plugin {
         continue;

       const fqdn = util.fqdn(ns.name.toString('ascii'));
-      const resource = Resource.decode(ns.data);
+      let resource;
+      try {
+         resource = Resource.decode(ns.data);
+      } catch (e) {
+        this.logger.warning('Name ' + fqdn + ' uses unsupported serialization format - skipping')
+        continue;
+      }
+
       let zone = resource.toZone(fqdn);

       zone = this.pickRRs(fqdn, zone, mergeDB);
diff --git a/lib/client.js b/lib/client.js
index 17afd51..795c2fb 100644
--- a/lib/client.js
+++ b/lib/client.js
@@ -129,7 +129,6 @@ class AXFRQuery {

       this.socket.write(len);
       this.socket.write(msg);
-      this.socket.end();
     });

     this.init();

And that is what appears to be a complete lack of care of compatibility with existing DNS standards

Yup working on that! DNSSEC PR finally made it to hsd v3 so hsd root server should be compatible with all major recursive resolvers (including bind). Still needs private KSK/ZSK or just a CSK.

I'm running Knot Resolver as a DoH server+ hsd root to use personally. Only kdig supports +https.

kdig @hns.dnssec.dev proofofconcept A +dnssec +https

I did some benchmarks using resperf a while ago and it performed pretty well since recursive resolvers do most of the heavy work anyway (as long as hsd is used as an internal root server and isn't exposed to the outside world). Super easy to scale because you can run as many hsd servers as you like and plug them as root hints to recursive. This won't be viable once you get to 1.1.1.1 level of traffic but it will get you very far.

Under heavy load it could even avoid signing NXDOMAIN and reply with SERVFAIL for those since they're a lower priority (a form of rate limiting). Records signed on demand/on the fly can be cached and even stored on disk. So you can get far with online/or on demand signing and support HIP-5 without too much headache.

hsd still needs to support offline signing better and fix any DNS compliance issues in bns. I think avoiding the urkel tree entirely by parsing blocks manually and storing those in a proper format will be much much faster for zone transfers and would enable IXFR, dynamic DNS updates ... etc. The urkel tree is just not designed to do this

james-stevens commented 2 years ago

they really should fix hdns.io their dnssec is broken

LOL - K - I'll see what I can do

bns needs some clean up to properly handle edns0 things. Regarding the cookies, hsd root server should set behind a recursive resolver anyway and not be exposed publicly.

So sig0 should never have been there :wink: - when I said that 2 yrs ago I got shouted down & so I left (telegram)

TBH, IMHO nearly all of the DNS in hsd should not be there - it's just reinventing a wheel that's been reinvented far too many times already. All hsd needs to do is provide an unsigned AXFR & everything else can be done in bind. This solves so many issues hsd has come up against.

Records signed on demand/on the fly can be cached and even stored on disk

And that's the second big disadvantage of signing on the fly - it doesn't scale. The first is that you have to give your private keys to all slaves.

PowerDNS caches them in memory using an MD5 hash as the key, which it assumes will be unique, which is not good. They'd be better off using a faster hash like FNV & allowing for collisions. What I did was use FNV + name length, which (for a 64 bit FNV) gave about 3 or 4 collisions in 50M names, so not too bad. Also when you do get a collision & have to drop down to a memcmp, you probably only have to compare the first byte.

First came Static signing, but that was slow, then signing-on-the-fly but that doesn't scale, risks your private keys & has a ton of edge cases it struggles with, so now large zones use dynamic signing - bind has it built-in & its really easy to configure.

Dynamic signing basically starts with statically signing the entire zone then, as changes come in, it work out which DNSSEC records need to be changed/replaced & only changes those ones. It also means you have to gradually replace all the RRSIG as they expire. bind does this on a rolling basis spreading it out over time.

I'm, also using NSEC3+OptOut, so I only sign & create NSEC3 & RRSIG for those names that have DS - right now that's just under 3000, which is quite a lot more than it used to be!!

Long term it will be interesting to see what happens there, cos AFAIK, DANE will require ppl to have a signed zone & DANE is the decentralised solution for HTTPS, isn't it? So if ppl want HTTPS, they will need to sign their TLD.

zone "." {
        type slave;
        masters { 127.0.0.9; };
        file "/data/zones/ROOT.merged";
        key-directory "/keys";
        auto-dnssec maintain;
        inline-signing yes;
        };

That is literally all you need for bind's dynamic signing (with your keys in /keys - I run bind in chroot) & hsd listening on 127.0.0.9:53 - as you can see it would also be easy to have multiple hsd backends for failover, but ONLY if the SOA Serial was linked to the data, as described before - so it would be consistently correct across multiple instances of hsd.

You will also need these, becuase the hsd ROOT zone is full of "illegal" names

        check-names master ignore;
        check-names slave ignore;
        check-names response ignore;
        check-sibling no;
        check-integrity no;

If you're going to sign in hsd, you really should be doing dynamic signing.

Ideally hsd should be the one signing the data, becuase hsd is the one that validated the blockchain, but if you have hsd & bind running together in the same container, then the relativity is it doesn't really make a ton of difference.

Super easy to scale because you can run as many hsd servers as you like and plug them as root hints to recursive

If you AXFR the merged ROOT into bind, get bind to dynamically signed it - then slave that into any resolver than wants it, it will scale even better. So that's what I'm doing.

the plugin was sending a TCP FIN after it's done writing the query

Interesting - yeah some modern protocols require that sort of thing

I'm running Knot Resolver

I've not used their resolver, but their auth server is fast. Unless they've changed recently, the main difference with bind is that bind always used to be the only DNS server that could be both - so I can slave the ROOT zone into my resolver.

Still needs private KSK/ZSK or just a CSK

Yes, signing with a a known private key is 100% pointless. Although, you might be able to start with hard-coded keys then immediately roll over into a dynamically created (ephemeral) one. There is a specific protocol for rolling over the ROOT DS - needless to say, it didn't work first time they tried it, so they had to roll the key back - but I think the bugs are fixed now.

I guess it must involve signing the new keys / DS with the old ones or some such. Could be an interesting way of doing it.

If you are using ECDSA384 (or better) it's hard to see the point of using two keys, except rolling the CSK will mean a change of DS, where as rolling the ZSK means you don't need to change DS

But with ECDA384, the only reason I can think of for rolling a key at all is if the keys are compromised, but in that case you're going to have to roll the KSK as well anyway.

I guess rolling ZSK makes people think you're doing something. Some people roll the hash in the NSEC3PARAM which is 100% pointless, but I guess it makes them feel they're doing something.

james-stevens commented 2 years ago

they really should fix hdns.io their dnssec is broken

Its not hdns.io itself as it isn't signed, all the NS are hosted in the same parent zone & its DNSSEC is broken - intermittently & only when you ask for authoritative answers (RD=0) !!!

My-PC$ dig +norec +dnssec @35.226.188.118 ns.m-d.net DNSKEY

; <<>> DiG 9.16.1-Ubuntu <<>> +norec +dnssec @35.226.188.118 ns.m-d.net DNSKEY
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 35117
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

"Give me your DNS Keys" -> "NO"

K - that would do it :rofl:

I've given them a ping on Slack

james-stevens commented 2 years ago

Can you apply this patch to the plugin to see if this resolves serialization/transfer issues

Yes - Solved XFR from bind, also removed those errors when it initiated the connection :+1:

Thanks :smile_cat:

james-stevens commented 2 years ago

I'm sure its not actually a problem, but get these errors when bind does an SOA poll

[error] (ns) Out of bounds read (offset=32).
    at BufferReader.readU32BE (/opt/hsd-orig/node_modules/bufio/lib/reader.js:256:10)
    at EXPIREOption.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5825:22)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readOption (/opt/hsd-orig/node_modules/bns/lib/wire.js:6620:27)
    at Option.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:5441:19)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at OPTRecord.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:3788:32)
    at Function.read (/opt/hsd-orig/node_modules/bufio/lib/struct.js:143:23)
    at readData (/opt/hsd-orig/node_modules/bns/lib/wire.js:6513:24)
    at Record.read (/opt/hsd-orig/node_modules/bns/lib/wire.js:1885:17)
[error] (ns) EFORMERR: unexpected authority.
    at RootServer.answer (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:242:13)
    at RootServer.handle (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:316:24)
    at Server.<anonymous> (/opt/hsd-orig/node_modules/bns/lib/server/dns.js:72:20)
    at Server.emit (events.js:400:28)
    at TCPSocket.fire (/opt/hsd-orig/node_modules/bns/lib/internal/net.js:350:17)
    at Parser.<anonymous> (/opt/hsd-orig/node_modules/bns/lib/internal/net.js:365:12)
    at Parser.emit (events.js:400:28)
    at Parser.feed (/opt/hsd-orig/node_modules/bns/lib/internal/net.js:574:12)
    at Socket.<anonymous> (/opt/hsd-orig/node_modules/bns/lib/internal/net.js:396:19)
    at Socket.emit (events.js:400:28)

If I tell bind to stop using cookies, the second one disappears, but not the first

    answer-cookie no;
    send-cookie no;
james-stevens commented 2 years ago

Also, in an ideal world, if hsd supports AXFR it should also support NOTIFY :wink:

Seriously, it really doesn't matter to me, I can increase the SOA Checking interval & SOA Checking is really cheap cos its all local & UDP (1 packet each way)

But it is what a "normal" Auth server would do

Linking the SOA Serial to the data version is actually about the only thing left I'd really like to have.

buffrr commented 2 years ago

Its not hdns.io itself as it isn't signed, all the NS are hosted in the same parent zone & its DNSSEC is broken - intermittently & only when you ask for authoritative answers (RD=0) !!!

Oh I meant their current public resolver service. They strip RRSIGs from all ICANN TLDs, and there is no authenticated denial of existence for handshake TLDs. Example this query is bogus but they serve it anyway dig @103.196.38.38 proofofconcept aaaa

So sig0 should never have been there πŸ˜‰ - when I said that 2 yrs ago I got shouted down & so I left (telegram)

I don't see why those would have to compete although I don't like the custom sig0 used. There's no question that resolvers need to validate DNSSEC correctly but SIG0, DNSCurve, DoT, DoH solve a different problem. They're intended for securing a single hop, in this case, between a trusted resolver and a client.

Sure, this isn't needed if clients validate DNSSEC directly but would you embed a full recursive/or stub dnssec resolver in curl, wget, git, mail clients, browsers and all apps so they could support DANE? (they'd start with an empty cache too). Ideally, there is a secure API provided by the operating system that validates DNSSEC. Apple kinda supports this but it's very limited currently, there is kDNSServiceFlagsValidate flag which can be used with dnssd. Even Google chrome uses the OS crypto API to validate certificates so there should be something similar for DNSSEC. Embedding the DNSSEC chain in TLS is going to make this easier.

There are no easy to use DNSSEC libraries yet that apps can just plug into their application to support this either (i don't like getdns). The idea of securing the last mile with a trusted resolver is fine for now. Apps could just validate the last hop (because recursive covers the whole chain) and check the AD bit for security status of the response. If the resolver is giving them its own KSK or SIG0 key or cert pub key (in case of DoT/DoH) security-wise it's the same. In all of those cases, it can trick them into thinking a fake response is secure. This is a choice that apps will make on their own, if you don't like sig0 don't use it πŸ˜…

A decentralized DNSSEC validator can start with a light client that verifies proof of work and gets the DS record for the given TLD from the blockchain. Verifies the chain starting from that DS record/trust island. Root KSK could end up becoming an obsolete concept in Handshake. It mainly exists for interoperability with existing software.

I'm sure its not actually a problem, but get these errors when bind does an SOA poll

[error] (ns) Out of bounds read (offset=32).

Bind is sending Expire edns OPT RFC-7314. You can see this error with dig @127.0.0.1 -p 5349 . soa +ednsopt=9:00 hsd gives a FORMERR answer which is fine I guess but the exception indicates that bns wasn't reading it correctly.

[error] (ns) EFORMERR: unexpected authority.

I believe bind was trying IXFR here dig @127.0.0.1 -p 5349 . ixfr=2000 it adds an SOA record to the authority section in the query so bns says "unexpected authority" because it doesn't understand IXFR.

Also, in an ideal world, if hsd supports AXFR it should also support NOTIFY πŸ˜‰ Seriously, it really doesn't matter to me, I can increase the SOA Checking interval & SOA Checking is really cheap cos its all local & UDP (1 packet each way)

Supporting NOTIFY is trivial once SOA serial is consistent i created an issue here to track support for SOA.

james-stevens commented 2 years ago

Thanks to your help & this plug-in bridge.jrcs.net is working & updating again - its a validating resolver & provides [A/I]XFR of the signed ROOT, people can run their own container of it from here -> https://github.com/james-stevens/handshake-resolver or here https://hub.docker.com/r/jamesstevens/handshake-resolver

This is not intended as any kind of final solution, but a useful sticking plaster for the time being that can get you quickly up & running resolving & validating handshake DNS.

I'm much happier it is now using an unpatched copy of hsd, although I'm still getting a few JS errors (as documented previously), but they're not show stoppers, so I care a lot less

Ideally, there is a secure API provided by the operating system that validates DNSSEC

Most linux use systemd now & that has a validating stub resolver built-in, with the ability to cache & have multiple ROOT DS. I've tested with my ROOT/DS & it works fine. Seems to me a validating & caching stub-resolver is a good route to a solution.

There's good evidence most people prefer to use a "trusted service provider" rather than having to do too much themselves. If somebody like CloudFlare provided a merged-signed ROOT using keys they held, similar to what I have done, I suspect a lot of people would be happy with that.

That still leaves the last-mile as a problem - but that's no different from DNSSEC on ICANN's ROOT, which I suspect is why it's never really been used much in client applications. Plenty of resolvers validate, but then there's no security from the resolver to the client, DNS Cookie is really only very weak - apparently they're working on something better that adds privacy (so presumably encrypts the data???)

Root KSK could end up becoming an obsolete concept in Handshake.

Yes - and that's really neat, but it doesn't solve the last-mile problem, unless the light client & caching is part of the client's libresolv.

Also, I'd expect any standard DNSSEC validation would need changes to work with no ROOT KSK/DS

james-stevens commented 2 years ago

Root KSK could end up becoming an obsolete concept in Handshake

I really like this idea as well as the idea of distributing the ROOT by blockchain, instead of having centralised ROOT servers.. Having the TLD's DS pre-validated from the blockchain should significantly improve the performance of DNSSEC validation, as neither the TLD nor the ROOT need to be validated.

However, would be even nicer if the ICANN TLDs were also in the blockchain. IMHO some will claim their TLD, but many won't, out of loyalty to ICANN, like Verisign. So either you'd need to have some daemon process that inserts those TLDs for them, or fall-back to ICANN's KSK, which is a shame, but perfectly reasonable.

Mike told me about the browser work you've done - that's really cool, nice one.

You may as well close this "issue". Thank you so much for all your help.

buffrr commented 2 years ago

its a validating resolver & provides [A/I]XFR of the signed ROOT, people can run their own container of it from here -> https://github.com/james-stevens/handshake-resolver or here https://hub.docker.com/r/jamesstevens/handshake-resolver

Ha cool! sorry I was quite busy and didn't get a chance to look into this repo. We'll see if I can get a HIP-5 server up and running to optionally replace ._eth NS records in the exported zone.

Mike told me about the browser work you've done - that's really cool, nice one.

Thanks! getting hnsd to work reliably on mobile and doing DNSSEC validation with very limited CPU time is very tricky. The desktop version will be significantly easier.

You may as well close this "issue". Thank you so much for all your help.

I updated the plugin to fix the problems in this issue so you no longer need to patch it.