GSI-HPC / lustre_exporter

Prometheus exporter for use with the Lustre parallel filesystem
GNU General Public License v3.0
18 stars 20 forks source link

Add error handling in case jobid cannot be parsed #20

Closed schroedo closed 2 years ago

schroedo commented 2 years ago

Sometimes a jobid falls outside the regex pattern; currently an empty jobid is returned which can lead to problems if this happens for multiple jobids during a single sweep.

I two changes suggest in procfs.go:

  1. in getJobNum: replace return "", nil with something like return "", errors.New("No valid JobID found in jobid string: #" + jobID + "#")

  2. in parseJobStatsText replace if err != nil { return nil, err } with something like if err != nil { log.Errorf("ERROR: getJobNum failed: %s", err) continue } That way we log the error and skip over the offending entry.

    github.com/prometheus/common/log and errors need to be imported for this to work.

gabrieleiannetti commented 2 years ago

Hi,
thanks for reporting the issue.

Skipping of empty job_id fields has been implemented in https://github.com/GSI-HPC/lustre_exporter/commit/971549917dfaa27e34846bff7a4370aef1312fbc.

Still it is only in the master branch.
It is targeted for the next version release 2.1.3.

Best Gabriele