18F / pulse

How the federal .gov domain space is doing at best practices and policies.
Other
94 stars 56 forks source link

Switch to Censys snapshot, add Reverse DNS snapshot data #743

Closed konklone closed 6 years ago

konklone commented 6 years ago

This switches us to a snapshot of data from a recent Censys.io export (as we will be on a hiatus from using their service until we set up a Google Cloud account to bill BigQuery queries to), and adds a new subdomain source: Reverse DNS data collected by Rapid7.

The Reverse DNS data is also a snapshot, as the full dataset is a 130GB JSON Lines file that is non-trivial to parse and filter. I expect eventually we'll get more sophisticated at processing this data automatically, but for now, it's updated "whenever". It adds 2,850 unique hostnames that respond to HTTP/HTTPS and which don't appear in other sources.

konklone commented 6 years ago

Superseded by #745.