ClusterCockpit / cc-backend

Web frontend and API backend server for ClusterCockpit Monitoring Framework
https://www.clustercockpit.org
MIT License
15 stars 14 forks source link

Add aggregated energy to solution to job views #261

Open moebiusband73 opened 5 months ago

moebiusband73 commented 5 months ago

Show total energy to solution for a job to the job meta data, the job list and job view.

This requires a aggregated energy value for a job. The cc-metric-collector already offers aggregated as well as incremental energy counters based on RAPL.

Discussion is required how other node agents could provide those metrics and how to make this configurable.

giesselmann commented 5 months ago

I could imagine two ways to get energy to solution for a job: 1) Submit an external value via stop_job API endpoint, that would support arbitrary installations but would require people's submit scripts to query their metric backends. 2) Allow the user to configure a 'power-metric' or 'energy-metric' e.g. cpu_power or node_power and integrate/read those values in the job-archiving routines.

Ideally these methods could work together in a way, if value submitted from outside, internal calculation is skipped.

In my opinion job-specific integration is a task for the backend, not the metric store. From our experience, only integrating the minute measurements is not accurate, a better job-energy is obtained from multiplying average node_power with the actual runtime.