ai-cfia / howard

The Howard project, named after "The Godfather of Clouds" Luke Howard, orchestrates the Kubernetes-based cloud infrastructure for the Canadian Food Inspection Agency's AI lab, managing applications like Nachet, Finesse, and Louis. It prioritizes robustness, security and efficiency
https://ai-cfia.github.io/howard/
MIT License
3 stars 0 forks source link

As a DevSecOps, I would like to monitor my system and user applications #91

Closed ThomasCardin closed 5 months ago

ThomasCardin commented 7 months ago

Executive Summary

Adding dashboards to Grafana is crucial for visualizing the status of our applications and expediting debugging in case of failures.

Context

Currently, we lack the ability to observe our applications deployed on our cluster. To address this issue, leveraging Grafana dashboards in conjunction with Prometheus will enable us to visualize various key information. These dashboards will also prove invaluable in identifying the root cause of an application failure promptly.

System Applications

For our system applications, we frequently utilize Helm charts for deployment. In most cases, it is possible to enable functionalities to expose metrics to Prometheus, which can then be displayed in a dashboard. For many system application dashboards, it is possible to find community-created dashboards on this website

Client Applications

For our client applications, it will be necessary to create dashboards and implement the export of metrics, logs, and traces.

TODO

ThomasCardin commented 5 months ago

Finesse and nachet dashboard will be done inside this issue: https://github.com/ai-cfia/howard/issues/240