department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

EKS testing collab with Infra #71586

Open laineymajor opened 9 months ago

laineymajor commented 9 months ago

PROBLEM STATEMENT

Infra is working to upgrade EKS (due end of January 2024). Extensive testing is needed for this upgrade, and the Infra team needs help from us as we have extensive knowledge re vets-api on our team.

ADDITIONAL INFO

Testing plan ideas (started by Lindsey) Testing will will occur in the new EKS clusters. This team is happening in collaboration with the infrastructure team (Nate Peterson is the PM contact).

ACTION ITEMS

  1. Review config map in each cluster. When a new version is upgraded, some of the config has to be redone, so this will be good to go over together once they are ready.
  2. Review argo template
  3. Test (per infra team's request) in new EKS cluster [this will be send via request in the engineering workstream shared slack channel]. This will need to be done in dev-api.gov... this will be the new cluster
  4. Test fwd proxy
  5. test actual service
  6. Respond to infra team and confirm reciept of communication

*Anytime the infra team requests testing of vets-api, Rachal, Gia, and Ryan will sync to discuss:

DevOps Stuff for Platform Team: Check with Kshitiz and Chris first if this is needed

OCTO Objectives:

Objective: 02: Our platforms are the best way to deliver products at VA OKR OKR1: Our platforms hit the "elite" level (as defined by DORA) on Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service

DEFINITION OF DONE

We are confident in flipping the switch to a new EKS cluster!

VALIDATION

### Associated stories
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/70146
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74825
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74643
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74655
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/75623
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74637
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/82426
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/77350
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/82296
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/87848
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74633
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74550
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/71315
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74644
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74647
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74649
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74824
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74648
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74651
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74653
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74654
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/74656
- [ ] https://github.com/department-of-veterans-affairs/va.gov-team/issues/91587
laineymajor commented 8 months ago

This will continue as we receive more information from the infra team

jennb33 commented 8 months ago

Jennifer poked Nate Peterson (who was OOO on 1/10/2024) for Infra bandwith re testing on 1/11/2024. Rachal ran a load test but it was not accumulating. She is reviewing the HPA and it is coinciding with DEV.

jennb33 commented 7 months ago

@RachalCassity are there individual work tickets that can be associated to this Epic? I removed the 5 story points to the Epic, because Epics aren't usually pointed, the stories within them are. Let me know if I should write a ticket to associate here. Thanks!

LindseySaari commented 7 months ago

After our meeting yesterday, @flooose and @Kshitiz-devops worked with the Infrastructure/Lights team to work on a way to port forward to the vets-next application. Next up is to get the Elasticache/Redis connection working and we should be able to continue to move forward with the testing phase.

LindseySaari commented 7 months ago

After meeting with IST, we discussed further automating the testing of Vets API in preparation for upgrades. Upgrades will be on a regular cadence, so automation of the testing components will be beneficial.

Some initial thoughts for testing:

  1. See this document for chart tests A test helm chart that we can use to deploy/manipulate
  2. Jobs that we can run on demand that test things like the db connection (RDS and elasticache)
  3. Rake task for Redis connectivity
  4. Kicking off a Sidekiq job
  5. Other thoughts: load testing for our HPA metric, synthetics to hit an endpoint, etc
  6. Set up dashboards to monitor vets-next in datadog (logs, hpa, etc)
LindseySaari commented 6 months ago

I sent this doc over to IST for review