Collaborate with Operations on Unified Dashboarding Solution - Githubissues

eosnetworkfoundation / engineering

A workspace for documentation by Engineering primarily regarding process

MIT License

0 stars 0 forks source link

Collaborate with Operations on Unified Dashboarding Solution #48

Closed kj4ezj closed 1 year ago

kj4ezj commented 1 year ago

As part of an epic for a unified metrics and alerting system around EVM infrastructure, this ticket is to collaborate with Operations to survey their existing ability to dashboard and alert on metrics in order to prevent fragmentation within the organization as metrics and alerting are built out for Engineering. Their solutions may or may not meet Engineering needs, but we should prevent fragmentation where possible.

See Also

engineering issue 68 - EVM Monitoring and Alerting - Phase 1

eos-evm issue 602 - Funnel EVM Health Checks into CloudWatch
engineering issue 48 - Collaborate with Operations on Unified Dashboarding Solution
engineering issue 49 - Create Bot to Alert via IM on Specific Metrics
eos-evm issue 603 - SMS Alerting for EVM Infrastructure Health Checks
engineering issue 65 - Email Alerting for EVM Infrastructure Health Checks
telegram-bot issue 1 - Open-Source This Repo
engineering issue 57 - Create Telegram Service Account
engineering issue 58 - Create EVM Testnet Alert Channel Using Telegram Service Account
engineering issue 64 - Create EVM Mainnet Alert Channel Using Telegram Service Account
engineering issue 66 - Fix EVM CloudWatch Alerts
telegram-bot issue 2 - Human-Friendly Alerts
engineering issue 71 - EVM Alerts for APAC Infrastructure
telegram-bot issue 3 - Alert Bot Maintainer via Telegram on Errors

kj4ezj commented 1 year ago

I engaged with a representative of Operations on 2023-08-11.

On 2023-08-14, we shared our needs, existing tooling, and vision with each other.

Operations does not currently have metrics and alerting infrastructure that meets Engineering needs. My understanding from these conversations is that Operations would be grateful to collaborate on a shared Prometheus and Grafana stack to aggregate ENF metrics.