Timestamp not refreshed when db2diag check times out

The analysis of the "most recent" lines in db2diag.log is based on a timestamp that is preserved in a file (e.g. the file /tmp/last_date_check_db2diag_db2inst1). After the analysis is done, the new timestamp is persisted, so the next check will start from a new timestamp.

Now when the plugin fails, then the timestamp is not refreshed - which at the first glance looks plausible: we want the unchecked lines to be looked at again when the check runs the next time again.

However when the amount of new entries in db2diag.log written since the last timestamp on disk are excessive enough, this can lead to the situation where the plugin times out. And this is really bad: next time the plugin runs, it reads all the same lines again and maybe additional ones, so the check has trapped itself forever and will no recover.

I did not make this up, but I saw a real case where the check, over and over again, tried to parse huge di2diag.log of more than 5 days, reliably timing out - with no chance to recover.

angoca / monitor-db2-with-nagios

Timestamp not refreshed when db2diag check times out #66