GSA / site-scanning

The central repository for the Site Scanning program
https://digital.gov/site-scanning
11 stars 2 forks source link

Expand ingest sources, round 2 #824

Open gbinal opened 8 months ago

gbinal commented 8 months ago

Note - when we do this, we should pause and gutcheck whether to normalize the filenames and references in the source code, in this folder, this folder, in the target URL list itself, and in the sources column of the snapshots (anywhere else?).

and wait, also gutcheck whether any data sets need to be harvested for the .mil and the .gov as well

gbinal commented 6 months ago

Maybe also this first - https://github.com/GSA/site-scanning/issues/964

gbinal commented 6 months ago

weigh - https://github.com/cisagov/dotgov-data/blob/main/dotgov-websites/other-websites.csv and note also others mentioned in the 5-8-24 email

And note in https://github.com/cisagov/scan-target-data - https://github.com/cisagov/scan-target-data/blob/develop/current-federal-non-dotgov.csv https://github.com/cisagov/scan-target-data/blob/develop/eot-2016.csv https://github.com/cisagov/scan-target-data/blob/develop/eot-2012.csv