ai-cfia / howard

The Howard project, named after "The Godfather of Clouds" Luke Howard, orchestrates the Kubernetes-based cloud infrastructure for the Canadian Food Inspection Agency's AI lab, managing applications like Nachet, Finesse, and Louis. It prioritizes robustness, security and efficiency
https://ai-cfia.github.io/howard/
MIT License
3 stars 0 forks source link

As a Devops, I would like to leverage observability and monitoring for our hosted client applications #122

Open SonOfLope opened 6 months ago

SonOfLope commented 6 months ago

Currently, our hosted applications operate without a centralized system for monitoring metrics and managing alerts. This situation limits our ability to proactively address performance issues, optimize resources, and enhance the overall reliability of our services. To address this, we propose integrating OpenTelemetry, Azure Monitor and Grafana, aiming to establish a centralized monitoring framework that enhances visibility into our applications health.

Design

image

Steps

OpenTelemetry integration

OpenTelemetry - Grafana integration

References :

ThomasCardin commented 6 months ago

See this issue that might contain duplicates https://github.com/ai-cfia/howard/issues/91

ThomasCardin commented 6 months ago

Note for your diagram: prometheus can't receive traces. Only metrics.

SonOfLope commented 6 months ago

Note for your diagram: prometheus can't receive traces. Only metrics.

I used what is being said in this doc : https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/alertmanagerexporter

Am i misunderstanding ?

ThomasCardin commented 6 months ago

No AlertManager seems good. I was talking about the link between otel-collector and prometheus

rngadam commented 6 months ago

What is the performance impact of collecting this telemetry across our apps?