OCP-on-NERC / nerc-ocp-config

1 stars 16 forks source link

feat: Add 6am alerts in rhods-notebooks namespace #414

Closed schwesig closed 2 months ago

schwesig commented 2 months ago

This commit introduces more ope status alerts in the 'rhods-notebooks' namespace within the 'nerc-ocp-prod' cluster. The alerts are designed to be triggered at 6am and send to Slack channel alerts-prod-rhods-ope, focusing on ephemeral storage, memory usage, PVC claims, storage requests, container counts, and pod owner counts.

Changes made:

  1. Rules:

    • Added alerts for monitoring the percentage of limit used for ephemeral storage, memory, PVCs, and storage requests at 6am.
    • Added alerts for counting containers and pod owners at 6am, providing a snapshot of resource utilization.
    • Based on time trigger, providing daily insights into resource usage patterns before classes start.
  2. Configuration:

    • Routing to 'slack-notifications-prod-rhods-ope' receiver.
    • Alerts matching ^Custom6amOpe.* to catch all new rules from 1.
hpdempsey commented 2 months ago

Looks good. Are you addressing alerts for the timeouts we observed recently elsewhere, or do we not have sufficient info to create an alert on that yet?