esonderegger / dotmil-domains

An incomplete listing of `.mil` domains and the code for the scraper used to build the list
MIT License
48 stars 14 forks source link

Some .mil domains and subdomains from Certificate Transparency logs #2

Closed konklone closed 9 years ago

konklone commented 9 years ago

You can search a a Certificate Transparency search engine built by Comodo, crt.sh, for for any known issued .mil domains:

https://crt.sh/?dNSName=%25.mil

Most of them chain up to a USG root, and are not publicly trusted. There are 78 unique .mil sub/domains who had certs issued that chained to a commercial provider:

https://gist.github.com/konklone/00afdaa2b9c4fe1e7843

But for the purpose of establishing a .mil domain list, you may want to look at (and dedupe) the domains that chain up to a USG root. That said, many of these domains (such as www.mil) do not actually seem to resolve over public DNS, and they may either be old, invalid, or purely internal.

I defer to you on how to best incorporate them -- my suggestion is to incorporate those which have public DNS records of any sort (whether or not they respond over HTTP).

esonderegger commented 9 years ago

Thank you for alerting me to this!

I added a scraper function for the crt.sh results to the script and added the results to the csv file.

I'm a little bit torn on what to do with these results though. On the one hand, I feel a list like this should err on the side of being inclusive. On the other, there are a lot of duplicates in this list now and I'm not knowledgeable enough about DNS to know which ones belong here and which ones do not.

I'm fairly certain just chaining up to a USG root doesn't mean it should be excluded. An example of a domain found by the crt.sh page is mcds.army.mil. It chains up to a USG root, but displays as trusted in Firefox, Chrome, and Safari. Also, even though it appears to be private site requiring CAC login to do anything, its landing page and DNS are public.

However, a lot of duplicates appear to be different nodes behind a load balancer. A good example of this is the site I work on for my day job, exhibits.dtic.mil. exhibits-ws.dtic.mil is the domain we use for the "web services" portion of our application, with exhibits-ws1.dtic.mil and exhibits-ws2.dtic.mil as the addresses of the primary and failover nodes. There are public DNS records for all three domains, pointing to three unique IP addresses and none of them respond to HTTP requests from non-whitelisted addresses. I can't think of a good way to write a script that can tell the difference between those domains and I also can't think of a reason from a DNS perspective that they should be treated differently.

It feels like I'm being lazy, but for the time being I think I'll leave all these results in. I started playing around with dnspython tonight, but it may end up making more sense to just parse the output from a subprocess call to nslookup.

konklone commented 9 years ago

I'm fairly certain just chaining up to a USG root doesn't mean it should be excluded.

Oh, yeah, good call -- I was excluding them for my purposes because I was specifically trying to find use of commercial CAs in the .mil space, but that shouldn't make any difference here. The more important thing seems like whether or not they currently have public DNS records (whether they are reachable or not).

I think leaning towards inclusiveness is helpful here, as you're doing. :+1:

An example of a domain found by the crt.sh page is mcds.army.mil. It chains up to a USG root, but displays as trusted in Firefox, Chrome, and Safari.

Hmm. It's not trusted for me in Chrome here on Ubuntu. Are you checking from Windows or Mac? It might be at the OS level, or it might be because your computer already has the DoD root specially installed on it?

In either case, the root CA being used shouldn't determine whether or not the domain is in the list, as you said. I'm just curious the reason for the difference here.

esonderegger commented 9 years ago

Ok, I just added some code to check if the DNS is active for a domain. It slows the script down considerably, but I think it's worth it. I think now that the crt.sh domains with active DNS are included, I can go ahead and close this issue.

Good call on the DoD root cert being installed my machine. I'm pretty sure that was it. Unfortunately I have them installed on both my Mac and Ubuntu machines so I don't have a good control to test with.

Thanks for your help on this!