department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
284 stars 206 forks source link

Update Grafana Version 7 #12322

Closed ricetj closed 4 years ago

ricetj commented 4 years ago

Description

The Identity team and BE tools team would like to utilize some features in Grafana 7 (latest). This will be a major upgrade from 6 to 7 and it looks like there are going to be some issues with w/ backend plugins & signing.

Background/context/resources

Both Bennie and Keifer are willing to support this rollout.

Technical notes

Definition of Done


Reminders

pnwstevan commented 4 years ago

Detailed documentation of upgrade procedure for future reference/onboarding

tl;dr Detailed documentation of first production deploy; process itself and documentation is likely excessive, but consider this essentially upgrading Grafana the hard way as a bit of a forcing function to learn a bunch of things. Hopefully this is helpful for future reference and onboarding other Ops team members.

Thoughts/Notes:

Prerequisites:

Access and Auth:

Local Tools for Dev & Testing:

Upgrade Steps:

  1. Ensure all preqs and access per above is setup - better to do this now than get blocked in the middle of a workflow.

  2. Pre-work - gather context, requirements, etc..

  3. Making the changes for upgrade

    • Clone the master branch of the respective repo locally (in this case, Grafana configs are in the devops repo)
    • Create and checkout a new local branch
    • Make the respective changes to enable upgrade:
      • Version bump in Ansible config for Grafana
  4. Test the upgrade locally

    • Ensure you have a valid AWS token and working CLI
      • Can test with aws sts get-caller-identity
    • Review and follow guide for using Vagrant
    • Go to your local working directory with checked out branch then ~/devops/ansible
    • Setup Python
      • Virtual Environment
      • pip install -r requirements.txt
      • Build with changes: export APP=grafana-vagov;export ENV=vagov-dev; export REF=master; vagrant up --provision-with build
      • Forward port to connect locally: vagrant ssh -- -L 3000:localhost:3000
      • Login page @ http://localhost:3000 - yay!
        • Boo - can't login cause no DB :(
      • Reset admin password!
      • From SSH shell into Vagrant Host:
        • Find container: docker ps
          • Shell into container: docker exec -ti <container id> bash
          • Reset admin password: grafana-cli admin reset-admin-password <new password
      • Can log in to locally upgraded test build!
      • Notice there are warnings about unsigned plugins, which I saw in release notes and need to make another change.
      • Destroy Vagrant build: vagrant destroy
      • Make change to allow unsigned plugins
      • Rebuild again with above steps
      • SUCCESS! :D
      • Don't submit a PR yet, since it will get built, deployed and released automatically per Jenkins schedule (more on this below).
  5. Prepare for Production Upgrade

    • Since there is some risk, and my first deploy going to be extra cautious and do this manually.
    • Disable daily build for Grafana by removing cron in seed job:
    • Also need to make this change on live production server:
    • NOW we can submit our actual upgrade PR, without fear it will be deployed automatically.
    • Discuss with team in office hours; plan for upgrade 10am PST 8/25
    • Announce ^ via Slack to original request thread, and in #vfs-engineers
  6. Production Upgrade

    • Get browser tabs open/ready:
    • Jenkins Prod
    • Grafana
    • AWS Console:
      • RDS (for Prod DB Backup / Monitoring)
      • CloudWatch (created a simple dashboard for network/io from ELB serving Grafana)
    • Build, Release & Deploy time!
    • Backup Grafana Prod DB
    • Find prod grafana DB in RDS
    • Actions->Take Snapshot
    • Confirm snapshot is complete
    • Leave tab open for DB metrics in case needed for troubleshooting
    • Build, Release, Deploy Grafana upgrade:
    • Login to Jenkins
    • BUILD!
      • Build->grafana-vagov
      • Build with Parameters
      • Click on active Build #
      • Monitor build with "Console Output"
      • When complete, click link in "Scheduling Project:"
    • RELEASE!
      • Click on active Release #
      • Monitor release with "Console Output"
    • DEPLOY!
      • Click on active Deploy #
      • Monitor deploy with "Console Output"
    • If successful, app should be up, which it was.
  7. Testing / Validation

    • Logged into Grafana
    • Checked a bunch of charts
    • Looked for anything obviously broken
    • Monitored CW metrics and RDS charts
  8. Post Deploy / Clean-up

    • Notification upgrade was complete in previous Slack threads
    • Re-enable seed job:
    • Revert commit (could have submitted new commit/PR, thought this was cleaner)
    • Re-enabled it on live Jenkins server (see above)
pnwstevan commented 4 years ago

@ricetj Docs as discussed (CC: @dginther).

Is there a SOP for closing issues - do you/PMs typically do that? When they're done, or at end of sprint?

ricetj commented 4 years ago

@ricetj Docs as discussed (CC: @dginther).

Is there a SOP for closing issues - do you/PMs typically do that? When they're done, or at end of sprint?

Hey Stevan, we have been having folks close their own tickets after completing all the AC. In terms of a SOP we do not have that.