We need a functional system for monitoring usage and cost by user, ideally with a no-code dashboard. This feature would empower us to manage resource allocation and open up registrations to new users more confidently.
Requirements:
CPU and Disk Usage Monitoring by User:
Monitor disk usage and CPU usage (or relevant cost factors) for individual users.
While disk usage can be monitored with du checks, we need a way to generate reports over time, not just at an instant.
Reporting and Analytics:
Provide reports on server options used and duration by user.
Create a system to monitor incremental and shared costs. This involves reporting the incremental cost for node creators and shared costs equally among node users.
Dashboard:
Develop a no-code dashboard to visualize usage and cost data.
Include functionality to pre-set usage limits for users from these dashboard.
Integration and Metrics:
Integrate with Graphana and Prometheus for improved metrics collection from AWS and other cloud vendors.
Ensure the system can handle cost anomaly detection.
Challenges:
Calculating "cost per user" is complex due to the shared nature of resources (e.g., multiple profiles on a single node).
Obtaining live cost information from AWS is challenging.
Supporting multiple cloud vendors adds another layer of complexity.
Proposed MVP:
Metrics Collection:
Enhance AWS metrics collection to include hourly data instead of just daily totals.
Disk Usage Monitoring:
Implement a disk usage monitoring and cleanup procedure.
Cost Anomaly Detection:
Use existing tools (e.g., @satra 's anomaly detection system) for total cost anomaly detection.
Graphana and Prometheus Integration:
Integrate with Graphana and Prometheus for comprehensive monitoring and alerting.
This is a rough outline based on a convo with @asmacdo. Input and collaboration from the team will be crucial to refining the requirements to meet our needs.
Description:
We need a functional system for monitoring usage and cost by user, ideally with a no-code dashboard. This feature would empower us to manage resource allocation and open up registrations to new users more confidently.
Requirements:
CPU and Disk Usage Monitoring by User:
du
checks, we need a way to generate reports over time, not just at an instant.Reporting and Analytics:
Dashboard:
Integration and Metrics:
Challenges:
Proposed MVP:
Metrics Collection:
Disk Usage Monitoring:
Cost Anomaly Detection:
Graphana and Prometheus Integration:
References:
This is a rough outline based on a convo with @asmacdo. Input and collaboration from the team will be crucial to refining the requirements to meet our needs.