18F / tock

We use Tock to track and report our time at 18F
https://18f.gsa.gov/2015/05/21/TockingTime/
Other
120 stars 37 forks source link

Update deployment docs and lower New Relic log level #1811

Closed nateborr closed 1 month ago

nateborr commented 1 month ago

Description

Update the developer documentation for egress proxy setup and Tock deployment. This documents the egress proxy updates we've just applied in staging to restore New Relic traffic.

Also reduce the New Relic agent log level back to info.

Addresses #1792 .

Testing

Once deployed, SSH into the tock-staging container and confirm that the New Relic agent is only logging at the info level and above:

tail /tmp/newrelic-python-agent.log

Production deployment

With this update, we are clear to deploy the changes that have accumulated in staging over several weeks, including these latest changes for New Relic data.

That will consist of three steps:

  1. Manually update and re-push the production egress proxy, following the updated docs, to add the gov-collector.newrelic.com domain the the egress proxy's allow list. This reflects the egress config that I've validated in staging.
  2. Push up a tag for a release and auto-deploy the updated main branch to production.
  3. Delete the production network policy for production-egress on port 8080.

Step 3 is a clean-up step that I've applied in staging; it closes an unused path for external network traffic and simplifies the configuration for future maintenance.

Avoiding downtime

Our unexpected downtime in staging when we applied the New Relic egress related changes was due to a few factors:

  1. Pushing an outdated version of the Caddy proxy application, which broke all external network traffic
  2. A pre-existing bug in the run.sh script that caused application start-up to fail if the New Relic admin tool failed to record the deployment
  3. Doing the code deployment first, followed by the egress proxy update without validation in between

Issue 2 was resolved by #1807 (although that fix won't be in effect until the release is deployed) and the other two issues will be mitigated by following the deployment steps in the order above with a pair / co-pilot, and with manual smoke-testing of Tock between each step.

codecov-commenter commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.18%. Comparing base (b7ed7c5) to head (2e2be1e).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1811 +/- ## ======================================= Coverage 94.18% 94.18% ======================================= Files 66 66 Lines 4177 4177 ======================================= Hits 3934 3934 Misses 243 243 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.