Closed stephenlienharrell closed 3 months ago
for CPU need core-affinity matched to job id
For Memory: Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory: ps -o pid,ppid,pgid,comm,%cpu,%me
Snapshot this at the same time as the rest of the metrics - find out if there is a way to get the job id, then match jobid to specific processes on-node to get snapshot of memory usage.
Can we do this programmatically for any other statistics?
regarding the approach above, need to make sure we can capture detached processes
Duplicate of #46
Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.