Closed wmhutchison closed 2 years ago
[1:11 p.m.] Barre, Steven Suggestion for the testing: Check https://console.apps.klab.devops.gov.bc.ca/monitoring/dashboards/grafana-dashboard-api-performance and see if any of the graphs show a significant change between the before and after upgrade. KAMLOOPS LAB CLUSTER
Could we also get someone to create a reference app, with full Sysdig / uptime monitoring? Verify the "prod" app survives the upgrade with no intervention and stays up. Post upgrade rolls through a new build and deploy to ensure all those steps still work. Sample app should make use of all the platform services as well. Vault, Artifactory, VPA, Sysdig, RHEL entitlement builds, Patroni, CrunchyDB, etc
At present the upgrade methodology goes like this.
Ignoring whether or not we're talking about LAB or PROD, we want to strengthen overall communications and testing processes as follows.
TODO - update this EPIC and move the content into a sub-task.
Closing off this EPIC/Sprint Goal - the work desired involves documentation, development and implementation of Canary applications which will automate most of this on an ongoing basis. Stay tuned for the new Sprint Goal involving Canaries.
Describe the issue The Openshift 4.8 upgrade revealed that as additional software/components are installed in both LAB and PROD clusters, we need to ensure that all aspects of these components are being reliably tested. The end result going forward would be likely expanding on the existing EPIC tickets Platform Ops opens for upgrades, which we'd expand as well to include additional components and assign those additional tickets to whomever is responsible for the apps/components in question.
Additional context Since CCM is the main mechanism for applying new services/components, it would be good to have documention within CCM to hook into other docs so that as other parties expand upon in CCM, Platform Ops is made aware of these new items so they are not having to re-audit CCM each time Openshift is upgraded.
Definition of done