dotnet / arcade-services

Arcade Engineering Services
MIT License
54 stars 73 forks source link

Monitoring and Telemetry of the Release Infrastructure #2644

Closed andriipatsula closed 3 months ago

andriipatsula commented 1 year ago

Goals & Motivation

Currently we don't have any mechanism to measure the health and performance of the Release Infrastructure or to monitor infrastructure-related issues. With each build of the Staging and Release pipelines, numerous errors and warnings arise. We need to implement a solution that allows us to track these issues, identify their severity and measure execution time. By establishing proper monitoring, we can ensure the stability and reliability of our Release Infrastructure.

Business Objectives

One of the primary business objectives of the .NET Release Infrastructure Adoption is to establish telemetry and alerting mechanisms, enabling us to effectively monitor the release infrastructure.

Goals

  1. We aim to quantify the number of manual steps performed during the release process.
  2. We aim to quantify the number of issues detected during the release process.
  3. Our goal is to monitor the time spent by the Release Infrastructure and .NET Release team on issue resolution during the release process.
  4. Our goal is to monitor the number of errors and warnings encountered during the release process, comparing each build over time.
  5. By introducing the ability to rerun parts of the Staging pipeline, our goal is to detect bottlenecks and calculate how much time and effort it saves.
  6. We aim to quantify the time needed for the end-to-end process of releasing the .NET product, from acquiring a coherent build to the release of the final bits.
  7. We aim to quantify the time needed to prepare an "emergency" release (to be compliant with the K hours SLA).

Milestones

One pager

https://dev.azure.com/dnceng/internal/_git/dotnet-release?path=/documentation/OnePagers/telemetry-and-monitoring.md

Release Issue template

https://github.com/dotnet/release/blob/main/.github/ISSUE_TEMPLATE/release-issue-template.yml

andriipatsula commented 8 months ago

PR: https://dev.azure.com/dnceng/internal/_git/dotnet-release/pullrequest/35165

andriipatsula commented 8 months ago

The new Release Issue template was introduced: https://github.com/dotnet/release/blob/main/.github/ISSUE_TEMPLATE/release-issue-template.yml

tkapin commented 4 months ago

What needs to be done to close this yet? Small things form my perspective: move the dashboard from "apatsula" folder to "release-infra" folder, ensure the dashboard json is backed up / stored properly, send the woohoo email with description of what has been done and agreed on.

andriipatsula commented 3 months ago

I'm closing this task. A 'Woohoo' email has been sent out. Additionally, I've established a backup for the dashboard and set up automatic deployment.