BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Maintain Applications (new page) #2589

Closed mtspn closed 1 year ago

mtspn commented 2 years ago

Maintain Applications new page.

Definition of done

mtspn commented 1 year ago

Notes from team day on topics that could be included on this page: Monitor an Application

Pilargit12 commented 1 year ago

Draft outline Google Doc: https://docs.google.com/document/d/1Gc8SfaDjYeH_HTnt3IvEO6VbNCfyJOwr2EZmny8-M7A/edit

Each author will be writing their own outline to draft this document, we will then review it together and vote for the best outline and focus on this new page creation.

Pilargit12 commented 1 year ago

Final outline has been established and division of sections.

@IanKWatts - Reliability and Resiliency | Security (August 9 - 10 ) @ShellyXueHan - Monitoring and Alerting | Communitation (August 9 - 10) @caggles - Images | Personnel (August 4)

ShellyXueHan commented 1 year ago

@Pilargit12 copy pasting the markdown content in google doc doesn't work very well as the format breaks. So I'm just putting them here for now, it's easier to put it into a PR for review later when needed!

Monitoring and Alerting

Application monitoring is a critical aspect of maintaining a healthy and efficient application environment. It allows you to proactively manage performance, detect and resolve issues quickly, and make informed decisions to enhance the overall user experience and business outcomes.

Once your application is running in OpenShift, you can use Sysdig to monitor app healthiness and performance via Kubernetes metrics. Here are a list of steps to follow:

If service availability is important to you, leverage Uptime.com for uptime monitoring and public service status pages. The Platform Services Team uses Uptime.com to share the status of OpenShift clusters as well as shared services. You can check out the SaaS service catalog to explore more about Uptime.com.

With comprehensive monitoring and alerts setup for your application can greatly reduce service downtime or disaster, but this does not guarantee the application will be up and running 100% without any issues. In the event of an issue or downtime, you can:

Communication

The OpenShift platform is targeted to provide zero downtime during planned maintenance activities, such as version upgrades. However, this does not mean that your application is all set once it's running in production. Platform maintenance do impact your workloads in various ways, especially major version upgrades and deprecation of outdated features and APIs. Thus, it is important for your team to stay up-to-date with the platform changes and keep an active communication with Platform Services Team and the community.

Before any scheduled platform and shared services changes, Platform Services Team will reach out to all Product Owners and Technical Leads registered from the Product Registry. To ensure your team gets notified of upcoming events in advance, please make sure the contact information for POs and TLs are accurate and updated all time on the Product Registry.

In addition to that, it's recommended for all team members to stay connected with the community via the following channels:

Pilargit12 commented 1 year ago

Github PR can be found:

https://github.com/bcgov/platform-developer-docs/pull/182

Pilargit12 commented 1 year ago

New PR to fix broken links: https://github.com/bcgov/platform-developer-docs/pull/184