SODALITE-EU / monitoring-system

Monitoring system description and config files.
0 stars 1 forks source link

PoC for HPC job execution monitoring #4

Open rosogon opened 4 years ago

rosogon commented 4 years ago

@MarioMartReq to set up an exporter to get job status on HPC. You can reuse the ipmi exporter, but for running qstat -f {{ job_id }} | grep 'exit_status' | grep -o '.$'

Then, add an alert (f.e. when job exit is not 0).