GSA / data

Assorted data from the General Services Administration.
2.11k stars 275 forks source link

Remove zones from DNS data known to be used as CNAME intermediaries #121

Closed konklone closed 6 years ago

konklone commented 6 years ago

Some of the data I pulled in from Cisco Umbrella DNS data in #120 is just CNAME intermediary records used for creating an abstraction layer between DNS implementation and public records.

It's understandable why this would come up in DNS log data, and due to the nature of DNS, it will necessarily resolve properly when accessed directly, but these hostnames are never used for end-resolution of web services, and web services should not be expected to have certificates that are valid for any intermediate CNAMEs they happen to make use of in their deployment infrastructure.

An agency is definitely still responsible for its intermediate CNAMEs -- and one area where they can create some risk is their potential for use in subdomain hijacking, which has bitten my office before -- but I don't believe they are responsible for serving HTTP traffic directly at these hostnames.

Generally, the sources GSA and DHS use to gather hostnames come from sources where there is some reason to expect that the hostname might reasonably be expected for a user to access the service by directly, and would not just be a CNAME intermediary. However, the Cisco DNS data is different, and so I think we do need to be willing to exercise some discretion here in this case.

This is one of the only (maybe the only?) times I've ever considered manually excluding hostnames from scans. However, I do feel this presents an unreasonable burden on agencies, that undermines the intent of M-15-13 and BOD 18-01, and is affecting a couple of programs that have reached out to me directly, and I'd like to be responsive, especially since this is a manual dataset with no obvious way to fix the issue heuristically.

Since this affects DHS as well, I do want to give them a chance to ring in. cc @h-m-f-t @jsf9k