Closed eschoeller closed 4 years ago
Yes the STDOUT/STDERR from rrdtool which was originally being redirected into boost.log is now lost, despite which log verbosity options I select.
I knocked the boost_rrd_update_max_records down to 250k so it would run more often. Here are some stats:
(sadly did not have these graphs working while I was running 1.2.4)
Nice Graphs. There was another change in 1.2.6 around the use of procopen vs. popen. That should not be an issue, but it should be checked too. @eschoeller the rrdlastupdate should no longer be required with RRDtool 1.5.x. Someone should look at this.
I've added some debug, and it does indeed look like the culprit is not the fixing that was done for boost itself, but the proc_open() fix. Investigating now.
Update the lib/boost.php lib/dsstats.php lib/rrd.php and poller_boost.php and let me know the difference.
Applied the updates.
I do see these in cacti.log:
2019/09/22 10:53:08 - BOOST WARNING: Stale Poller Data Found! Item Time:'1569171061', RRD Time:'1569171062' Ignoring Value!
2019/09/22 10:53:08 - BOOST WARNING: Stale Poller Data Found! Item Time:'1569171061', RRD Time:'1569171062' Ignoring Value!
2019/09/22 10:53:08 - BOOST WARNING: Stale Poller Data Found! Item Time:'1569171061', RRD Time:'1569171062' Ignoring Value!
2019/09/22 10:53:08 - BOOST WARNING: Stale Poller Data Found! Item Time:'1569171061', RRD Time:'1569171062' Ignoring Value!
I may just need to flush some tables. I'll keep monitoring, but this is a huge improvement. thank you! I'm going to likely drop off the radar until tomorrow.
I'm actually getting a lot of those Stale Poller Data messages. I don't think flushing the output tables would help. I can say that after every poller run the poller_output table still has data in it. I can't easily cross-reference the output in the table to the Stale Poller data messages. I think that log line should include the device ID and the data source ID for better troubleshooting.
Thanks for the feedback. What RRDtool version are you on? I'll fix that 'stale' data bit. It's not an issue past RRDtool 1.5 as we ignore it.
I'm on rrdtool 1.7.0. I was thinking about upgrading soon. Does the message indicate data loss?
No, there is no data loss. Just an old message that can be ignored now. I'll have it fixed shortly.
Taking that back. Update lib/boost.php and poller_boost.php with the latest updates.
Boost should be plenty fast now. Especially for large boost tables.
Applied. I'll let you know. My plans for today keep getting shifted back. But at some point I am certainly stepping away from the computer ;)
Dang, it's taking forever to run again. I might have grabbed the wrong files. Yep. I got the files from the develop branch. Woops. Fixing ..
OK, it's back to 30 second runtime. Stale messages appear to be gone as well.
I have encountered similar issues to what other users reported in #2929. I have implemented the suggested code, but I'm still running much slower than I was in 1.2.4.
Here are the results from my most recent forced boost run:
Here are some other settings:
In 1.2.4 I had boost_rrd_update_string_length set to '2000'. Increasing this value did not improve the boost performance.
I have tried to increment boost_parallel to combat this problem, but no matter what setting I use, there is still only one poller_boost.php process which runs. It only consumes about 44% of one CPU. MySQL really isn't consuming many resources, either.
Here are some entries from mysql-slow.log during a poller_boost run. I don't think it's that interesting:
I am also experiencing the same issue #2923. I believe these logs should be directed into the boost.log. I'm getting only status messages there, which doesn't make very effective use of the log file. Also, in 1.2.4 I would see these types of messages in boost.log:
From what I see in issue #1257 the lack of these messages might be because I don't have debug enabled? Either way, this seems like stderr-style messages which might be getting lost, similar to other discussions we've had in the past on the subject.
I'm going to keep poking around with this tonight, unfortunately I'll be mostly unavailable tomorrow.