eosnetworkfoundation / engineering

A workspace for documentation by Engineering primarily regarding process
MIT License
0 stars 0 forks source link

Collaborate with Operations on Unified Dashboarding Solution #48

Closed kj4ezj closed 1 year ago

kj4ezj commented 1 year ago

As part of an epic for a unified metrics and alerting system around EVM infrastructure, this ticket is to collaborate with Operations to survey their existing ability to dashboard and alert on metrics in order to prevent fragmentation within the organization as metrics and alerting are built out for Engineering. Their solutions may or may not meet Engineering needs, but we should prevent fragmentation where possible.

See Also

engineering issue 68 - EVM Monitoring and Alerting - Phase 1

  1. eos-evm issue 602 - Funnel EVM Health Checks into CloudWatch
  2. engineering issue 48 - Collaborate with Operations on Unified Dashboarding Solution
  3. engineering issue 49 - Create Bot to Alert via IM on Specific Metrics
  4. eos-evm issue 603 - SMS Alerting for EVM Infrastructure Health Checks
  5. engineering issue 65 - Email Alerting for EVM Infrastructure Health Checks
  6. telegram-bot issue 1 - Open-Source This Repo
  7. engineering issue 57 - Create Telegram Service Account
  8. engineering issue 58 - Create EVM Testnet Alert Channel Using Telegram Service Account
  9. engineering issue 64 - Create EVM Mainnet Alert Channel Using Telegram Service Account
  10. engineering issue 66 - Fix EVM CloudWatch Alerts
  11. telegram-bot issue 2 - Human-Friendly Alerts
  12. engineering issue 71 - EVM Alerts for APAC Infrastructure
  13. telegram-bot issue 3 - Alert Bot Maintainer via Telegram on Errors
kj4ezj commented 1 year ago

I engaged with a representative of Operations on 2023-08-11.

On 2023-08-14, we shared our needs, existing tooling, and vision with each other.

Operations does not currently have metrics and alerting infrastructure that meets Engineering needs. My understanding from these conversations is that Operations would be grateful to collaborate on a shared Prometheus and Grafana stack to aggregate ENF metrics.