NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.13k stars 359 forks source link

NXDOMAIN instead of NOERROR rcode when asked for existing CNAME record #870

Closed appliedprivacy closed 1 year ago

appliedprivacy commented 1 year ago

Describe the bug

Unbound answers to a CNAME query with NXDOMAIN instead of NOERROR but includes the actual existing record as well.

Actual expected rcode: NOERROR

Also: When asked for a CNAME, unbound asks the authoritative NS for an A record.

Actual expected qtype: CNAME

To reproduce Steps to reproduce the behavior:

  1. start unbound so it has an empty cache when the query reaches unbound (config is provided at the end of this bugreport)
  2. ask unbound for this existing CNAME DNS record dig _acme-challenge.bender-doh.applied-privacy.net CNAME -> NXDOMAIN
  3. ask unbound again without flushing the cache first, you will get a NOERROR rcode

Others on the mailing list have confirmed seeing the same issue.

While looking into the PCAP files from stub -> unbound and unbound -> authoritative, I also noticed that the CNAME query send to unbound results in unbound asking the authoritative for an A record - which does not existing. This mismatch in inbound and outbound qtype might be related to the root cause of the bug.

Expected behavior

unbound should ask the authoritative nameserver for a CNAME record not an A record. unbound should answer with an NOERROR rcode for existing CNAMEs - like other resolvers do (for example PowerDNS Recursor).

System:

pkg info unbound
unbound-1.17.1_2
Name           : unbound
Version        : 1.17.1_2
Installed on   : Sat Feb 18 22:20:01 2023 CET
Origin         : dns/unbound
Architecture   : FreeBSD:13:amd64
Version 1.17.1

Configure line: --with-libexpat=/usr/local --with-ssl=/usr --enable-dnscrypt --disable-dnstap --with-libnghttp2 --with-dynlibmodule --enable-ecdsa --disable-event-api --enable-gost --with-libevent --disable-subnet --disable-tfo-client --disable-tfo-server --with-pthreads --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man --infodir=/usr/local/share/info/ --build=amd64-portbld-freebsd13.1
Linked libs: libevent 2.1.12-stable (it uses kqueue), OpenSSL 1.1.1o-freebsd  3 May 2022
Linked modules: dns64 dynlib respip validator iterator
DNSCrypt feature available

Additional information

Mailing list discussions:

unbound.conf

server:
    verbosity: 0
    access-control: 109.70.100.0/24 allow
        access-control: ::1/128 allow
    access-control: 127.0.0.1/24 allow
    edns-tcp-keepalive: yes 
    incoming-num-tcp: 200

    # plain UDP
    interface: 127.0.0.1@53
    interface: ::1@53
    interface: 109.70.100.133@53

    num-threads: 2
    msg-cache-size: 100m
    rrset-cache-size: 200m
    key-cache-size: 10m
    neg-cache-size: 10m

    harden-below-nxdomain: yes
    minimal-responses: yes

    prefetch: yes
    prefetch-key: yes
    aggressive-nsec: yes

    use-caps-for-id: yes
    hide-identity: yes
    hide-version: yes
    hide-trustanchor: yes

    qname-minimisation: yes

    # The following line will configure unbound to perform cryptographic
    # DNSSEC validation using the root trust anchor.
    auto-trust-anchor-file: "/usr/local/etc/unbound/root.key"

    extended-statistics: yes
    statistics-cumulative: no
    statistics-interval: 0

remote-control:
    control-enable: yes

# root on loopback
auth-zone:
    name: "."
    master: "k.root-servers.net"
        fallback-enabled: yes
    for-downstream: no
    for-upstream: yes
    zonefile: "root.zone"
wcawijngaards commented 1 year ago

The A query is made because qname-minimisation is turned on. It first attempts to locate the data with query type A to hide the query type. If qname minimisation is turned off, it likely works and asks for the CNAME, with qname-minimisation: no.

The upstream server is problematic, it does not implement the DNS standard correctly. If the domain has answers for other query types, it is not NXDOMAIN. The query for type A is then answered with a reply that is called NOERROR/NODATA. This has rcode NOERROR and no records in the answer section. In the authority section there is a SOA record, that makes the message have a TTL, of that SOA record.

bleve commented 1 year ago

You are wrong. server is not giving wrong answer. NXDOMAIN is correct answer when CNAME destination is missing. So unbound should not return that to client, it should return the destination of CNAME when CNAME is being queried.

So this is special case where unbound currently gives wrong answer because it expects CNAME destination to actually exist when it shouldn't.

wcawijngaards commented 1 year ago

So, I tested it some more, thank you for the details for that.

When I test this with the config, I get the NXDOMAIN and then the NXDOMAIN again? Did I not copy a necessary part of the config, that causes the problem apparantly? So the second query also gets NXDOMAIN for me.

For a query where the qtype is CNAME. So where there is uncertainty what CNAME is referred to, the first one, or the last one. Unbound currently responds with an NXDOMAIN, it seems. That means it gives information regarding the last element in the chain, the CNAME that does not exist at the target name of the first CNAME.

The first query of type A, is because of qname minimisation still.

he32 commented 1 year ago

I agree with @bleve, this is not an instance of a RFC 8020 violation at the publishing name server (if that was ever the suggestion).

If the original query was for _acme-challenge.bender-doh.applied-privacy.net. a, it would be correct to return NXDOMAIN because the target of the CNAME record, bender-doh.acme-dns-challenge.applied-privacy.net. does not exist in the DNS. However, CNAME queries are somewhat "special", in that the recursor is not being asked to recurse through the CNAME record's target, but instead should just return the CNAME record at the queried-for name as is, irrespective of whether the target for the CNAME record exists or not. So unbound in the recursor role must take on the responsibility which comes along with converting the original CNAME query type to something else, due to the particular semantics of processing a CNAME query.

wcawijngaards commented 1 year ago

Is there a reference for this behaviour, i.e. NXDOMAIN or not response for qtype CNAME? Unbound currently takes NXDOMAIN as the final element for the CNAME chain, that is what RFC 8020 says too, in section 2. But it does not talk about an exception for qtype CNAME.

This seems to suggest the issue is about that rcode in the response for qtype CNAME. Not about the change in rcode for another query, or about qtype A queries. Because that is what the top post talks about.

he32 commented 1 year ago

As for a reference for the "don't recurse through the CNAME target when the query type is CNAME", I would have to dig for it. Give me some time for that. Even though it's not the authoritative source, that's the way BIND does it. And ... to me that is really the only thing that makes sense -- when you ask for a CNAME record and it exists, it should be returned, irrespective of whether the target for the CNAME record exists. However, for other record types, the presence of a CNAME record is "transparent", and recursion through the target of the CNAME record is implied.

wcawijngaards commented 1 year ago

Well, Unbound returns the CNAME chain, but the rcode differs in case the destination does not exist, for queries of type CNAME. I agree, it is a good idea to have similar output.

he32 commented 1 year ago

Well, we can go back to RFC 1034 which says:

CNAME RRs cause special action in DNS software.  When a name server
fails to find a desired RR in the resource set associated with the
domain name, it checks to see if the resource set consists of a CNAME
record with a matching class.  If so, the name server includes the CNAME
record in the response and restarts the query at the domain name
specified in the data field of the CNAME record.  The one exception to
this rule is that queries which match the CNAME type are not restarted.

This indicates that qtype=CNAME queries should not recurse through the target of the CNAME record.

he32 commented 1 year ago

So, I can perhaps suggest a conceptually simple change: with qtype=CNAME and when query minimization is turned on, do not change the qtype for the outgoing queries when doing the recursion.

wcawijngaards commented 1 year ago

The upstream server has a malformed response, it is server 157.53.224.1 ns1.desec.io. for applied-privacy.net. It returns a response to a query for bender-doh.applied-privacy.net. IN A with the A record before the CNAME. It should be the CNAME and then the item after the CNAME, so CNAME then A in the answer section. Unbound deals with this by removing the A record and chasing the CNAME target itself, so it would not really hamper resolution, but I would consider it malformed. The query is done here because the qname minimisation passes by the intermediate label. The outcome does not affect this particular issue. It was visible in the logs.

The output of the query:

;; flags: qr aa ; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; bender-doh.applied-privacy.net.  IN  A

;; ANSWER SECTION:
bender-dpriv1.appliedprivacy.net.   86400   IN  A   146.255.56.101
bender-dpriv1.appliedprivacy.net.   86400   IN  RRSIG   A 13 3 86400 20230413000000 20230323000000 12467 appliedprivacy.net. YpiUm+ZDcpZTIotIUF7ec7AllUqtmo5qp7y9DHIzAhi5jI24tJ5U7/oRqfpZGXwAIpUWO/8eFJpDmLLqARA1Jw==
bender-doh.applied-privacy.net. 86400   IN  CNAME   bender-dpriv1.appliedprivacy.net.
bender-doh.applied-privacy.net. 86400   IN  RRSIG   CNAME 13 3 86400 20230413000000 20230323000000 38828 applied-privacy.net. Cbcs2iTPqBdZeu7/GVtcrwo9yhT99lGOauxCoxV81qvgevtQiQ41fkGlFEDuACmFuW3fyCy8Jw3FyZa5HLEkkw==

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 0 msec
;; EDNS: version 0; flags: do ; udp: 1232
;; MSG SIZE  rcvd: 347
bleve commented 1 year ago

There is no such issue with my test record, cnametest.bleve.fi.

wcawijngaards commented 1 year ago

The issue was that qname minimisation considered the NXDOMAIN that was returned by the upstream as the answer to the type CNAME question and stopped checking the response with qtype CNAME, instead of qtype A. The fix makes sure that the NXDOMAIN is used that is pertinent to the question. In this case, that makes the NXDOMAIN passed by, to then query with type CNAME, and this then makes the response NOERROR. For the top post, that makes the initial query of type NOERROR, and also the second query after that of type NOERROR, with the CNAME in the answer to the question.

Thank you for the information about the details for this failure! The fix is committed to the code repo.

pspacek commented 1 year ago

The upstream server has a malformed response, it is server 157.53.224.1 ns1.desec.io. for applied-privacy.net. It returns a response to a query for bender-doh.applied-privacy.net. IN A with the A record before the CNAME. It should be the CNAME and then the item after the CNAME, so CNAME then A in the answer section. Unbound deals with this by removing the A record and chasing the CNAME target itself, so it would not really hamper resolution, but I would consider it malformed.

Hi @wcawijngaards. Do you have reference which says that RRs MUST be ordered in this way? Maybe my memory is failing, but I thought that sections are unordered sets of records...

wcawijngaards commented 1 year ago

No, CNAMEs and DNAMEs are in order, also the RRSIG follows the RRset that it signs. But I have no immediate reference for that. I guess the 4035 or so for RRSIGs, and 1034 or so for CNAME. Do you mean the ordering of NSECs in the authority section? That seems to be unordered. For the additional section, there is a bit of talk about ordering, and also implementation, eg. NSD would order the addresses in a delegation.

pspacek commented 1 year ago

No, CNAMEs and DNAMEs are in order, also the RRSIG follows the RRset that it signs. But I have no immediate reference for that. I guess the 4035 or so for RRSIGs, and 1034 or so for CNAME.

I remain doubtful about illegality.

I've checked RFC 1034 & 1035 briefly and cannot see that it imposes strict order. AFAIK DNAME algorithm is the last update to the canonical algorithm, and https://datatracker.ietf.org/doc/html/rfc6672#section-3.2 says "copy the CNAME RR into the answer section", not "append" or anything else about strict order.

As for RRSIGs, e.g. Knot DNS puts all RRSIGs at the very end of section and to my knowledge nothing complained (yet?). https://datatracker.ietf.org/doc/html/rfc4035#section-3.1.1 has an explicit SHOULD and not MUST in this regard.

So, I think this is at best under-specified but not outright illegal.

wcawijngaards commented 1 year ago

So RFC 1035, 4.1 talks about 'list of concatenated resource records (RRs)'. Not an unordered set, and it also is a list on the wire. Also the query processing algorithm from section 4 in RFC2672 creates the CNAME records in order and puts them in the answer section one after another. And then the answer after that. Also the example, in 1034 3.6.2, shows them in order, and the example in 6.2.7. And RFC1034 4.3.1 says 'The answer to the query, possibly preface by one or more CNAME RRs..'. 6.2.2 says that RRs are not ordered, about the ordering of RRs in an RRset. Also RFC 4035, says 'The name server MUST place the NS RRset before the NSEC RRset and its associated RRSIG RR(s)' in 3.1.4.

pspacek commented 1 year ago

Thank you for your time @wcawijngaards.

First, we can agree to disagree. Now I can see where you are coming from - I was just curious why Unbound is being strict here. The rest of this text is just an attempt to explain why I interpret it differently - feel free to ignore.

So RFC 1035, 4.1 talks about 'list of concatenated resource records (RRs)'. Not an unordered set, and it also is a list on the wire.

That section defines wire format, and honestly it seems like a stretch to say that it defines strict order for the data.

As an extreme example, say that the RFC text was defining wire format for "bunch of 32-bit integers" and said "store it as list of 32-bit big endian integers". Does that serialization format imply that it is list? Or set? I think the serialization format does not define that property.

Also the query processing algorithm from section 4 in RFC2672 creates the CNAME records in order and puts them in the answer section one after another. And then the answer after that.

Here we clearly disagree about interpretation of the text. RFC 2672 is obsoleted by RFC 6672, but even the original text did said copy, not append. IMHO it says the record must be present in the resulting answer section, not in what order.

Also the example, in 1034 3.6.2, shows them in order, and the example in 6.2.7.

Well, I can't see any text around the examples which would impose order - and I think we can agree that example has to provide some ordering when it is written down :-)

And RFC1034 4.3.1 says 'The answer to the query, possibly preface by one or more CNAME RRs..'.

Possibly. I'm not so sure it enforces strict ordering of CNAMEs... The text seems like high-level description and not exact spec, but I take your point.

6.2.2 says that RRs are not ordered, about the ordering of RRs in an RRset.

I don't see it in 6.2.2. Do you mean 6.2.1?

If so, 6.2.1 says:

The difference in ordering of the RRs in the answer section is not significant.

Well, it might be talking only about the example at hand, but it can be also interpreted literally.

Also RFC 4035, says 'The name server MUST place the NS RRset before the NSEC RRset and its associated RRSIG RR(s)' in 3.1.4.

I interpret that as instruction for determining truncation point (= NS has higher priority when TC=1 has to be set), but I agree that this specific case has order specified.

edmonds commented 1 year ago

The question of whether there is an ordering between RRsets in the answer sections comes up from time to time. There was a large email thread in dnsop from 2015 here:

https://mailarchive.ietf.org/arch/msg/dnsop/7KoE8Dr-SxuNToskxbvAwJ3BQLQ/

and an attempt at a specification clarification document here:

https://datatracker.ietf.org/doc/html/draft-jabley-dnsop-ordered-answers-00