blechschmidt / massdns

A high-performance DNS stub resolver for bulk lookups and reconnaissance (subdomain enumeration)
GNU General Public License v3.0
3.08k stars 460 forks source link

Thoughts on case "normalization" for certain types of rrs (PTR, CNAME, MX, SOA, NS) #81

Closed mzpqnxow closed 1 year ago

mzpqnxow commented 3 years ago

So recently I was working directly with some of the data that I regularly collect using massdns as part of a host inventory system project. This data is usually processed by a small set of scripts that handle things like normalizing answers that contain FQDNs and some other simple tasks, mainly parsing and rearranging the data into a few different "map" formats. This data is then compared to and/or stored with FQDNs from various other data sources (e.g. SSL certificate CN and DNS-type SAN values)

In this case, I was doing ad-hoc processing and forgot to lowercase the data from DNS. I realized it pretty quickly when using comm and getting some unusual numbers. I lower-cased the data and went on about my day

First, before I get into this, I'll point out/acknowledge that there is actually an RFC dedicated to DNS and issues of case-sensitivity in RFC4343. If you feel strongly about sticking to the RFC, I completely understand. Maybe you're already familiar with this RFC, it covers some interesting cases (for example, labels with escape characters, which I believe you recently addressed) where there is no concept of lowercase since the bytes aren't 7-bit ASCII. So I understand what I'm proposing may not be as simple as it seems in terms of impact on the integrity of the data given to the user running massdns

While I haven't looked too deeply into RFC4343, I have a feeling you're probably doing things pretty well in compliance with it and want to be clear that the current behavior is not at all a bug in massdns. I think at the worst you're following a SHOULD which is completely correct. With that said, would it be reasonable to have an optional flag (that may break with the RFC in some cases) provided it's a common use-case/requirement for users and isn't completely breaking to the vast majority of data in the real-world?

What I would like to have is an option for massdns to perform tolower() on specific rr types that I specify. In practice, this would be CNAME, PTR, MX, SOA and NS, I think. These are the cases where it would be useful for me and where I don't see any breaking behavior in my dataset, which is pretty large- on the order of hundreds of thousands of real records from thousands of authoritative nameservers. The RFCs have a formal list of record types and their respective case-sensitivity requirements and I realize what I'm suggesting may not match up with it, but it seems intuitive based on my data and use

If you considered this further, I would suggest it could be implemented similarly to the --ignore option, e.g --lowercase PTR --lowercase CNAME to have massdns perform the tolower() operation on both PTR and CNAME answers. I believe all sane (POSIX compliant) tolower() implementations will behave as no-ops if the byte isn't in the appropriate ASCII range, so it shouldn't affect bytes with the high bit set. Another option would be to provide an option that covers a preset list of rr types with a single flag, but the granular specification is good for me. I would prefer the granular style personally.

What are you thoughts on this? At the end of the day, it's something users can work around with a quick case conversion after the results are written from massdns. As a counterpoint though, many of the typical usages for massdns users (A, SOA, MX, CNAME rrs) require users to always perform this step themselves if they are storing or comparing data from DNS/massdns with other datasets they may have from outside of DNS, with the certificate CN/SAN DNS name example from before (though these values also require normalization, so maybe it's not a great example ...)

If you think this is a reasonable feature, I'd be happy to take a pass at implementing it if you don't have the time to or don't find it valuable enough to work on it but don't disagree with it in principle

If you'd rather avoid stepping into the role of modifying data that came directly from an NS, that's understandable as well. In that case, just take this is a vote for you to change your mind ;)

Thanks!

blechschmidt commented 3 years ago

In my opinion, massdns should focus on resolving DNS records as fast as possible and return the results with as few modifications as possible. However, since this feature would be optional and there might indeed be common use cases for this, I don't disagree with this feature in principle but also don't find it valuable enough to work on. I agree to a similar implementation to the --ignore option.

Maybe you're already familiar with this RFC, it covers some interesting cases (for example, labels with escape characters, which I believe you recently addressed) where there is no concept of lowercase since the bytes aren't 7-bit ASCII.

Before commit 23ee79a0a410ff326f23a44b83e934d77181e23c, massdns would not distinguish between dots as part of a label and label separators properly. This commit has changed the internal DNS name representation and because of the label length bytes, which must be left unchanged, extra care must be taken when using tolower.

Since I want to leave the internal representation in the most original state possible, I would prefer the case conversion to be performed inside dns_raw_record_data2str and dns_print_readable. If you want to implement this, I will happily accept a PR.

mzpqnxow commented 3 years ago

@blechschmidt thanks for the detailed response and the tips on the implementation- it would be quite bad to tolower() a label length byte by accident :)

Let's leave this open for a bit, I will probably send a PR in the next few weeks- if I forget then I guess it's not that important!

mzpqnxow commented 1 year ago

Well, looks like I didn't follow through on this (3 years later...)