Closed moebiusband73 closed 2 weeks ago
I could imagine two ways to get energy to solution for a job: 1) Submit an external value via stop_job API endpoint, that would support arbitrary installations but would require people's submit scripts to query their metric backends. 2) Allow the user to configure a 'power-metric' or 'energy-metric' e.g. cpu_power or node_power and integrate/read those values in the job-archiving routines.
Ideally these methods could work together in a way, if value submitted from outside, internal calculation is skipped.
In my opinion job-specific integration is a task for the backend, not the metric store. From our experience, only integrating the minute measurements is not accurate, a better job-energy is obtained from multiplying average node_power with the actual runtime.
Show total energy to solution for a job to the job meta data, the job list and job view.
This requires a aggregated energy value for a job. The cc-metric-collector already offers aggregated as well as incremental energy counters based on RAPL.
Discussion is required how other node agents could provide those metrics and how to make this configurable.