GSA / site-scanning

The central repository for the Site Scanning program
https://digital.gov/site-scanning
11 stars 2 forks source link

memorialize program principles #995

Closed gbinal closed 1 month ago

gbinal commented 1 month ago

Goals:

Principles:

Model: We take public datasets, use an open-sourced method to assemble and process them, and then produce the resulting Federal Website Index as a hosted flat file. Anyone can download and interact with that file at a consistent fixed location.

We then tell the Site Scanning engine to ingest that public index file once a day and use it for a list of target URLs. Each target URL is then loaded and scanned, and the resulting data is put into a database. The database is queryable via an API, and every week publishes a snapshot of all the data as a bulk, downloadable flat file.

The methodologies of the scans are public so that anyone can see how the data for any given website was derived. We work to continually iterate and improve the index and the scan methodologies, while ensuring the reliability of the daily scans

gbinal commented 1 month ago

Moving this here and closing.