Closed mtspn closed 1 year ago
Notes from team day on topics that could be included on this page: Monitor an Application
Draft outline Google Doc: https://docs.google.com/document/d/1Gc8SfaDjYeH_HTnt3IvEO6VbNCfyJOwr2EZmny8-M7A/edit
Each author will be writing their own outline to draft this document, we will then review it together and vote for the best outline and focus on this new page creation.
Final outline has been established and division of sections.
@IanKWatts - Reliability and Resiliency | Security (August 9 - 10 ) @ShellyXueHan - Monitoring and Alerting | Communitation (August 9 - 10) @caggles - Images | Personnel (August 4)
@Pilargit12 copy pasting the markdown content in google doc doesn't work very well as the format breaks. So I'm just putting them here for now, it's easier to put it into a PR for review later when needed!
Application monitoring is a critical aspect of maintaining a healthy and efficient application environment. It allows you to proactively manage performance, detect and resolve issues quickly, and make informed decisions to enhance the overall user experience and business outcomes.
Once your application is running in OpenShift, you can use Sysdig to monitor app healthiness and performance via Kubernetes metrics. Here are a list of steps to follow:
If service availability is important to you, leverage Uptime.com for uptime monitoring and public service status pages. The Platform Services Team uses Uptime.com to share the status of OpenShift clusters as well as shared services. You can check out the SaaS service catalog to explore more about Uptime.com.
With comprehensive monitoring and alerts setup for your application can greatly reduce service downtime or disaster, but this does not guarantee the application will be up and running 100% without any issues. In the event of an issue or downtime, you can:
The OpenShift platform is targeted to provide zero downtime during planned maintenance activities, such as version upgrades. However, this does not mean that your application is all set once it's running in production. Platform maintenance do impact your workloads in various ways, especially major version upgrades and deprecation of outdated features and APIs. Thus, it is important for your team to stay up-to-date with the platform changes and keep an active communication with Platform Services Team and the community.
Before any scheduled platform and shared services changes, Platform Services Team will reach out to all Product Owners and Technical Leads registered from the Product Registry. To ensure your team gets notified of upcoming events in advance, please make sure the contact information for POs and TLs are accurate and updated all time on the Product Registry.
In addition to that, it's recommended for all team members to stay connected with the community via the following channels:
Github PR can be found:
New PR to fix broken links: https://github.com/bcgov/platform-developer-docs/pull/184
Maintain Applications new page.
Definition of done