Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.61k stars 399 forks source link

Enhance logging of error messages when running external commands (Unix only) #1610

Closed eschoeller closed 6 years ago

eschoeller commented 6 years ago

I have traditionally invoked the poller in cron via this command:

* * * * * /bin/date >>/var/log/cacti/poller-error.log; /usr/local/php/bin/php /cacti/cacti-prod/poller.php >>/var/log/cacti/poller.log 2>>/var/log/cacti/poller-error.log

Most importantly I am redirecting STDERR output from the polling process to a separate file. I have numerous scripts which write to STDERR, and for quite a long time this was working. In one of the last updates along the way (I can't determine when, all my archived logs are blank) all the STDERR output I was receiving disappeared and that "poller-error.log" has been empty (save for a date entered every minute).

I am using spine, and tried switching to cmd.php to see if there was a difference. I still wasn't catching the STDERR output, even when trying to invoke the poller directly on the command line. At some point the spine "Invalid Data Logging" option was added, and perhaps that knocked out STDERR handling? I'm not sure. It would be very helpful to have it back. Perhaps through some additional options to the Invalid Data Logging option. Let me know what you think.

netniV commented 6 years ago

If you remove the files are they recreated? I have seen this issue in two cases before when it comes to logging into files.

The Invalid Data Logging option has been around some time now and is literally about how much logging goes into the cacti.log file from spine. If you believe there is a specific commit that may be contributing feel free to link it here, but I would highly doubt it is that.

Spine itself captures normal output from a script using the c++ popen command. As far as I am aware if you don't create new FD's then the parent process should be used.

eschoeller commented 6 years ago

I removed the file, and it was recreated a minute later, same owner and attributes as before. And it's getting output from the /bin/date command but no other STDERR output from the poller. And I know there is STDERR being generated from scripts which the poller is running every minute.

/var/log/cacti> ls -la poller-error.log
-rw-r--r-- 1 apache apache 58 May  6 17:22 poller-error.log
/var/log/cacti> cat poller-error.log
Sun May  6 17:21:01 MDT 2018
Sun May  6 17:22:01 MDT 2018
eschoeller commented 6 years ago

As an example:

/usr/local/bin/perl /cacti/cacti-1.1.36-prod/scripts/power.pl --pue_pdu --hit_rate --latency
WARNING: power.pl - RRD XXX_systemtotalpower_16090.rrd returned NaN value
WARNING: power.pl - RRD YYY_systemtotalpower_14527.rrd returned NaN value
facility_pue_pdu:1.70617911584521 average_latency:76.1176470588235 data_hit_rate:94.4444444444444

At some point within the past 4-6 months I was seeing this type of output in the poller-error.log file, but now it is gone. No telling how many things are broken with no way to report on them!

eschoeller commented 6 years ago

I suppose for completeness ....

/usr/local/bin/perl /cacti/cacti-1.1.36-prod/scripts/power.pl --pue_pdu --hit_rate --latency 1> /dev/null
WARNING: power.pl - RRD XXX_systemtotalpower_16090.rrd returned NaN value
WARNING: power.pl - RRD YYY_systemtotalpower_14527.rrd returned NaN value

:)

cigamit commented 6 years ago

So, both cmd.php and spine are run using exec_background(). The exec_background() function redirects standard error to /dev/null. It's likely that your poller.php was hacked until recently. I've made a change where you can now specify a redirect path to the exec_background() function to address the boost issue. Maybe we should leverage it here too.

But we would need a setting for that. I'm open to suggestions as to what to call it.

cigamit commented 6 years ago

Resolved with new path. Only supported on *NIX platforms.