Closed konklone closed 7 years ago
Please hold on merging until I can verify that a fully cron-initiated scan, using a copy of the code checked out from this PR's branch, completes and uploads and deploys successfully. Marking in the title for emphasis, and I will update here when the result is verified, which should be in < 2 days.
Worked perfectly on the first try! Scan fully completed and auto-deployed, data is now updated as of February 12th:
https://pulse.cio.gov/https/domains/
Latest scan data is also uploaded successfully to S3:
https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/scan/meta.json
This is ready for merging.
I've created a new scanning server instance, and set it up to perform a similar scanning process to the previous server. However, this server is not expected to also serve up one or more websites, and does not use Fabric or run nginx or other web server instances. Cloud.gov now serves pulse.cio.gov and https.cio.gov.
The new standalone server is not meant to be a long-term solution, but enough of a patch to restore automated Pulse scanning, and to avoid requiring a Pulse team member to perform manual deploys to update the production website. The long-term solution should still be to find a cloud.gov solution for the (extensive) scanning process, and to decouple the scanning process from depending on persistent disk.
I've left inline comments on the changed files to explain individual pieces. The main things to understand are that:
scan-box-deployer
service instance in thegsa-ogp-pulse
/pulse
org/space. The process for creating this service and generating deploy-ready credentials is identical to the process documented for Travis, but we do not use Travis to perform the deployment. Instead, the scanning server has the relevant credentials stored as environment variables in$HOME/.bashrc
, which issource
'd during the scanning process.pulse.cio.gov
S3 bucket, which is meant to be an ongoing archive for Pulse data, has been abandoned in favor of a cloud.gov-managed bucket. I did populate the cloud.gov-managed bucket with the contents of thepulse.cio.gov
bucket at the beginning of December 2016, but there are several scans which haven't yet been copied from the old one to the new one. There is also a cloud.gov backup bucket, tied togsa-ogp-pulse
/backups
. That also saw an initial population in December 2016, but has not seen any further population. It should be caught up, and then probably also auto-populated from the main bucket on a regular interval. I'm generally not satisfied with the level of backups we're performing for this data. If Pulse historical scan data is lost, it cannot be recreated.