GSA / data

Assorted data from the General Services Administration.
2.11k stars 275 forks source link

Error in Censys snapshot data #112

Closed jsf9k closed 6 years ago

jsf9k commented 6 years ago

@dav3r noticed some odd domains appearing in this week's DHS NCATS HTTPS and trustymail reports. I tracked the problem down to these lines in censys-federal-snapshot.csv.

It's simple enough to create a PR with the offending lines removed, but I thought this may point to a deeper problem in the code that collects the snapshot.

konklone commented 6 years ago

For quick reference in the thread, the lines are:

dns name=trainingprism-test.dhs.gov
dns name=trainingprism.dhs.gov

Hmm, I wonder if this specifically related to the BigQuery-driven collection process, or if this was also the case prior to using BigQuery.

Anyway, good eye - I filed https://github.com/18F/domain-scan/issues/200 in the source repo, since that's where the fix should come from. I'll also delete the two offending lines from the snapshot now so that they're removed for tomorrow.