bryanlabs / volunteer

Volunteer opportunities with Defiant Labs
Apache License 2.0
0 stars 0 forks source link

Create Dashboard for Horcrux Signing and Horcrux Proxy #4

Open danbryan opened 3 months ago

danbryan commented 3 months ago

Title: Create Dashboard for Horcrux Signing and Horcrux Proxy

Description:

Objective:
Design and implement a dashboard that visualizes key metrics from the Horcrux remote signer and the Horcrux Proxy, facilitating monitoring and troubleshooting.

Background:
Horcrux is a remote signer tool that securely signs transactions on Cosmos nodes using a distributed system. The Horcrux Proxy enhances flexibility by managing how Horcrux communicates with the Cosmos nodes. Each application (Horcrux and Horcrux Proxy) exports its own set of metrics, which are crucial for maintaining and optimizing the system's performance.

Tasks:

  1. Metric Collection:

    • Collect all available metrics exported by the Horcrux and Horcrux Proxy applications.
    • Identify and categorize metrics relevant to the following areas:
      • Raft Consensus: Monitor the participation of all shards in the Raft consensus process.
      • Cosigner Connections: Track the connection status of all cosigners to the Horcrux Proxy.
      • Proxy Connectivity: Ensure the proxy is consistently connected to all designated Cosmos nodes.
      • Chain-ID Monitoring: Display information on which chain-IDs the cosigners are operating on.
  2. Dashboard Design:

    • Design a user-friendly interface that displays the collected metrics in an easily interpretable format.
    • Include visual indicators for the health of Raft consensus, cosigner connections, and proxy connections.
    • Provide filtering options for different chain-IDs and allow users to focus on specific aspects of the system.
  3. Implementation:

    • Implement the dashboard using a suitable tool (e.g., Grafana, Prometheus) that integrates with the existing infrastructure.
    • Ensure real-time data updates and notifications for critical events (e.g., a shard not participating in consensus, a cosigner disconnecting from the proxy).
  4. Testing:

    • Perform comprehensive testing to validate the accuracy and reliability of the displayed metrics.
    • Include edge cases like network failures, shard downtime, and chain-ID mismatches to ensure the dashboard handles these scenarios effectively.
  5. Documentation:

    • Document the setup process, dashboard features, and usage instructions.
    • Provide troubleshooting guides for common issues that users might encounter while using the dashboard.

Resources:

Acceptance Criteria: