18F / pulse

How the federal .gov domain space is doing at best practices and policies.
Other
94 stars 56 forks source link

Standalone scanner, plus auto-deploy #651

Closed konklone closed 7 years ago

konklone commented 7 years ago

I've created a new scanning server instance, and set it up to perform a similar scanning process to the previous server. However, this server is not expected to also serve up one or more websites, and does not use Fabric or run nginx or other web server instances. Cloud.gov now serves pulse.cio.gov and https.cio.gov.

The new standalone server is not meant to be a long-term solution, but enough of a patch to restore automated Pulse scanning, and to avoid requiring a Pulse team member to perform manual deploys to update the production website. The long-term solution should still be to find a cloud.gov solution for the (extensive) scanning process, and to decouple the scanning process from depending on persistent disk.

I've left inline comments on the changed files to explain individual pieces. The main things to understand are that:

konklone commented 7 years ago

Please hold on merging until I can verify that a fully cron-initiated scan, using a copy of the code checked out from this PR's branch, completes and uploads and deploys successfully. Marking in the title for emphasis, and I will update here when the result is verified, which should be in < 2 days.

konklone commented 7 years ago

Worked perfectly on the first try! Scan fully completed and auto-deployed, data is now updated as of February 12th:

https://pulse.cio.gov/https/domains/

Latest scan data is also uploaded successfully to S3:

https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/scan/meta.json

This is ready for merging.