Closed eschoeller closed 5 years ago
Well that's quite odd. For the RRD in question, we need to know how many data sources. Not easy to instrument. I'm making an update in the mean time.
Test again with this latest update.
OK, I've got that in there now. I'll let you know if these errors pop back up. Chasing down a bunch of things right now.
2018/10/20 19:15:43 - ERROR PHP NOTICE: Undefined offset: 1 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line: 203 2018/10/20 19:15:43 - CMDPHP PHP ERROR NOTICE Backtrace: (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[203]:CactiErrorHandler())
What rrdtool version are you using? The values from the export are not matching the number of data sources or the format of the XML file changed.
You are going to have to instrument some code to trace down the RRDfile in question. Once you know the file that is causing these issues, run the 'rrdtool info' command on that file and post the output.
The main data collector is using rrdtool 1.7.0. I'll see what I can do to dig into this some more.
OK, I've traced this back to line 203 of dsstats.php. It's trying explode() on a '=' but there are times when it's getting a $line back that looks just like this:
<step>60</step>
Instead of what it really wants which is this:
step = 60
So, this is kinda fascinating, that there would be an irregularity here. Maybe I have some old RRD files?
The info command outputs the step as step = seconds
. The dump command exports it as XML, but this command does not leverage the step information from the dump, but from the info call. So, keep searching.
If you run rrdtool -
and then type info <path to rrd file>
you will see output similar to:
info test_users_8.rrd
filename = "test_users_8.rrd"
rrd_version = "0003"
step = 60
last_update = 1534862402
header_size = 3704
ds[users].index = 0
ds[users].type = "GAUGE"
ds[users].minimal_heartbeat = 600
ds[users].min = 0.0000000000e+00
ds[users].max = 5.0000000000e+02
ds[users].last_ds = "4"
ds[users].value = 8.0000000000e+00
ds[users].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 2900
rra[0].cur_row = 659
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
If you believe you are getting an error with a specific file, output the name of the file when you get this mismatch and then use it in the above command.
Yep that’s where I’m headed. Trying to determine which rrd files are causing this. Made a change last night to log those, will review results today.
Eric.
On Oct 23, 2018, at 9:49 AM, Mark Brugnoli-Vinten notifications@github.com wrote:
If you run rrdtool - and then type info
you will see output similar to: info test_users_8.rrd filename = "test_users_8.rrd" rrd_version = "0003" step = 60 last_update = 1534862402 header_size = 3704 ds[users].index = 0 ds[users].type = "GAUGE" ds[users].minimal_heartbeat = 600 ds[users].min = 0.0000000000e+00 ds[users].max = 5.0000000000e+02 ds[users].last_ds = "4" ds[users].value = 8.0000000000e+00 ds[users].unknown_sec = 0 rra[0].cf = "AVERAGE" rra[0].rows = 2900 rra[0].cur_row = 659 rra[0].pdp_per_row = 1 rra[0].xff = 5.0000000000e-01 rra[0].cdp_prep[0].value = NaN rra[0].cdp_prep[0].unknown_datapoints = 0 If you believe you are getting an error with a specific file, output the name of the file when you get this mismatch and then use it in the above command.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Did you manage to find any further information?
I am stumped right now. I found 5,274 RRD files (I have 23,073 data sources) affected by this issue. But it seems to be inconsistent. For 2,796 RRD files it's only happened once since I started collecting this data. For the rest it's happened between 2-139 times. I've executed an rrdtool info
on all those affected files and they all output in the "step = 60" format and all have rrd_version 0003. So how on earth the code is getting XML (sometimes) is a bit beyond me right now. I'm looking a bit higher up in this section of code and notice there's two different methods to calling rrdtool info
, one via proxy the other not. I'm wondering if that's somehow causing an issue. It's the only place I can see a difference in the info output happening, but it still doesn't seem logical to me.
From what I read the other day, as I'm not in front of the code right now, I would say that the proxy is either on or off. You can't have it on for a single data source for example.
Yep I would agree. I can’t see the proxy being used intermittently. But I’m going to add some more debug output to verify that.
My default system rrdtool is much older, version 1.4 I believe, and I have version 1.7 installed in /usr/local. Cacti is configured to use the /usr/local rrdtool. But this is where I might start poking around next, even though I don’t see it having any influence here. It’s something I need to “clean up” anyway. There’s some reason I’m configured this way (which I can’t recall) and I know there is some other lurking issue waiting to bite me when I make this switch.
I’m also going to craft up some code to dump the entire contents of the ‘rrdtool info’ output into temporary files so I can look a bit further into what Cacti is getting back.
Eric.
On Oct 25, 2018, at 2:04 AM, Mark Brugnoli-Vinten notifications@github.com wrote:
From what I read the other day, as I'm not in front of the code right now, I would say that the proxy is either on or off. You can't have it on for a single data source for example.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I am also noticing that I am getting this output in the log file in every poller run:
SYSTEM DSSTATS STATS: Type:HOURLY, Time:7.2996
And I'm getting this output roughly every 15 minutes:
SYSTEM DSSTATS STATS: Type:DAILY, Time:115.5919
And this is my data source statistics configuration:
Does this seem right to you?
I took a peek into boost.log today. I found a lot more undefined offset problems:
grep Undefined boost.log | sort | uniq -c 24922 Notice: Undefined offset: 1 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 203 112396 Notice: Undefined offset: 10 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 112396 Notice: Undefined offset: 10 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 112396 Notice: Undefined offset: 11 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 112396 Notice: Undefined offset: 11 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 29427 Notice: Undefined offset: 12 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 29427 Notice: Undefined offset: 12 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 29427 Notice: Undefined offset: 13 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 29427 Notice: Undefined offset: 13 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 29427 Notice: Undefined offset: 14 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 29427 Notice: Undefined offset: 14 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 29427 Notice: Undefined offset: 15 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 29427 Notice: Undefined offset: 15 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 43200 Notice: Undefined offset: 16 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 43200 Notice: Undefined offset: 16 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 43200 Notice: Undefined offset: 17 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 43200 Notice: Undefined offset: 17 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 38687 Notice: Undefined offset: 18 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 38687 Notice: Undefined offset: 18 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 38687 Notice: Undefined offset: 19 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 38687 Notice: Undefined offset: 19 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 35879 Notice: Undefined offset: 2 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 35879 Notice: Undefined offset: 2 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 58724 Notice: Undefined offset: 20 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 58724 Notice: Undefined offset: 20 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 58724 Notice: Undefined offset: 21 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 58724 Notice: Undefined offset: 21 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 44028 Notice: Undefined offset: 22 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 44028 Notice: Undefined offset: 22 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 44028 Notice: Undefined offset: 23 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 44028 Notice: Undefined offset: 23 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 44028 Notice: Undefined offset: 24 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 44028 Notice: Undefined offset: 24 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 44028 Notice: Undefined offset: 25 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 44028 Notice: Undefined offset: 25 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 44028 Notice: Undefined offset: 26 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 44028 Notice: Undefined offset: 26 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 44028 Notice: Undefined offset: 27 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 44028 Notice: Undefined offset: 27 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 44028 Notice: Undefined offset: 28 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 44028 Notice: Undefined offset: 28 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 44028 Notice: Undefined offset: 29 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 44028 Notice: Undefined offset: 29 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 35879 Notice: Undefined offset: 3 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 35879 Notice: Undefined offset: 3 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 42595 Notice: Undefined offset: 30 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 42595 Notice: Undefined offset: 30 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 42595 Notice: Undefined offset: 31 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 42595 Notice: Undefined offset: 31 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 32 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 32 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 33 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 33 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 34 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 34 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 35 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 35 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 36 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 36 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 37 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 37 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 38 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 38 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 39 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 39 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 70583 Notice: Undefined offset: 4 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 70583 Notice: Undefined offset: 4 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 40 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 40 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 41 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 41 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 42 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 42 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 43 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 43 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 44 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 44 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 45 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 45 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 46 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 46 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 47 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 47 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 30123 Notice: Undefined offset: 48 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 30123 Notice: Undefined offset: 48 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 30123 Notice: Undefined offset: 49 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 30123 Notice: Undefined offset: 49 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 70583 Notice: Undefined offset: 5 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 70583 Notice: Undefined offset: 5 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 145922 Notice: Undefined offset: 6 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 145922 Notice: Undefined offset: 6 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 145922 Notice: Undefined offset: 7 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 145922 Notice: Undefined offset: 7 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320 113658 Notice: Undefined offset: 8 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304 113658 Notice: Undefined offset: 8 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305 113658 Notice: Undefined offset: 9 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319 113658 Notice: Undefined offset: 9 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
I'll keep digging.
Hmm. Interesting development. I have started writing the contents of $info defined in lib/dsstats.php on line 174 to temporary files. After the very first run it turns out I have some files that are actually 'xport' files and not 'info' files at all. Take a look:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xport>
<meta>
<start>1540701180</start>
<end>1540787580</end>
<step>60</step>
<rows>1440</rows>
<columns>12</columns>
<legend>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
</legend>
</meta>
<data>
<row><v>4.8000000000e+01</v><v>4.8000000000e+01</v><v>6.9383333333e+01</v><v>6.9383333333e+01</v><v>2.3000000000e+01</v><v>2.3000000000e+01</v><v>7.0000000000e+01</v><v>7.0000000000e+01</v><v>7.1000000000e+01</v><v>7.1000000000e+01</v><v>1.1600000000e+02</v><v>1.1600000000e+02</v></row>
I am getting a lot of errors coming back from dsstats_rrdtool_execute() namely these two:
ERROR: can't make an xport without contents ERROR: creating arguments ERROR: No DS called 'foo' in '/cacti/rra/bar.rrd'
But there are an overwhelmingly large number of the can't make an xport without contents error.
I think something is going wrong with the rrdtool pipe. I am getting back all sorts of irregular, inconsistent data from the rrdtool commands being issued. I changed the pipes to be blocking and this had a dramatic reduction in the number of errors I'm seeing.
So, the real question is how are you ending up with xport over info?
Can you run the following query for me?
SELECT name,value FROM settings where name = 'storage_location';
here it is:
mysql> SELECT name,value FROM settings where name = 'storage_location'; +------------------+-------+ | name | value | +------------------+-------+ | storage_location | 0 | +------------------+-------+ 1 row in set (0.00 sec)
Some of the data I'm getting back is very incomplete too. I'm writing all the data Cacti is receiving back out to temporary files and I consistently see problems with about 15 data sources. For example, this is all the output stored in $info for a standard Unix - Ping Host data source:
ds[ipFragOKs].value = 0.0000000000e+00
That's just one line of info output from some other data source based upon the RFC1213 template.
Here's another odd example. In this case the file being queried is "abc_ping_3637.rrd" but I'm getting back data from "abc_ipreasmtimeout_3632.rrd".
filename = "/cacti/cacti-1.2.0-beta2-prod/rra/abc_ipreasmtimeout_3632.rrd" rrd_version = "0003" step = 60 last_update = 1540803244 header_size = 17936 ds[ipReasmTimeout].index = 0 ds[ipReasmTimeout].type = "COUNTER" ds[ipReasmTimeout].minimal_heartbeat = 120 ds[ipReasmTimeout].min = 0.0000000000e+00 ds[ipReasmTimeout].max = 4.2949672950e+09 ds[ipReasmTimeout].last_ds = "30" ds[ipReasmTimeout].value = 0.0000000000e+00 ds[ipReasmTimeout].unknown_sec = 0 ds[ipReasmReqds].index = 1 ds[ipReasmReqds].type = "COUNTER" ds[ipReasmReqds].minimal_heartbeat = 120 ds[ipReasmReqds].min = 0.0000000000e+00 ds[ipReasmReqds].max = 4.2949672950e+09 ds[ipReasmReqds].last_ds = "24504" ds[ipReasmReqds].value = 0.0000000000e+00 ds[ipReasmReqds].unknown_sec = 0 ds[ipFragOKs].index = 2
And I don't even think that's a complete set of data. So, I am getting sufficiently confused. It's no doubt that I must have some strange lurking data source out there tripping things up, but the code should be able to throw an appropriate exception and handle it.
I would agree. Not a complete set of data.
That means it’s not using proxy but is using local pipes.
Yes I’ve confirmed I never hit the proxy code. I’m wondering if this is a problem with the pipe in some way, and what’s the best way to troubleshoot and further diagnose this. Maybe the info and xport should use a different pipe?
Sounds like a plan. Or, maybe just check after an info request, you haven’t had XML. If you do, you know the previous command must of either overflowed the pipe in some way or made it believe it was complete when it wasn’t.
There are hundreds of these transactions happening so it's difficult to correlate. Can you give me a hint on how I might trigger the dsstats code to run more frequently? Perhaps on-demand from the command-line? I've been spending a good amount of time waiting around for my boost process to run so the dsstats can run directly after that. I don't think it really hurts to run it often, just so long as I don't have multiple ones running at the same time.
Let me review the code again and check some docs. It sounds most definitely like a pipe issue to me.
So, what I would suggest is that in the dssstats_rrdtool_execute(), you record the command and output into a file. I'm not sure what debugging you have in place at this point, so I'm going to start with my own:
diff --git a/lib/dsstats.php b/lib/dsstats.php
index 4e60d6ab..de76123a 100644
--- a/lib/dsstats.php
+++ b/lib/dsstats.php
@@ -817,6 +817,7 @@ function dsstats_rrdtool_init() {
@arg $pipes - (array) An array of stdin and stdout pipes to read and write data from
@returns - (string) The output from RRDtool */
function dsstats_rrdtool_execute($command, $pipes) {
+ global $config;
$stdout = '';
if ($command == '') return;
@@ -824,18 +825,23 @@ function dsstats_rrdtool_execute($command, $pipes) {
$command .= "\r\n";
$return_code = fwrite($pipes[0], $command);
+ $return_reason = 'EOF';
while (!feof($pipes[1])) {
$stdout .= fgets($pipes[1], 4096);
if (substr_count($stdout, 'OK')) {
+ $return_reason = 'OK';
break;
}
if (substr_count($stdout, 'ERROR')) {
+ $return_reason = 'ERROR';
break;
}
}
+ $temp = tempnam($config['base_path'] . '/log', 'dsstats');
+ file_put_contents($temp, "Command: $command\nReason: $return_reason\nOutput:\n$stdout");
if (strlen($stdout)) return $stdout;
}
I have also noted that this process doesn't open stderr and check for any potential errors that way either.
Yep, I have done something similar within dsstats_rrdtool_execute already:
if (substr_count($stdout, 'ERROR')) {
cacti_log("rrdtool_execute found ERROR for command:" . $command, true,'DSSTATS');
$errortime = microtime();
$errorfile = fopen("/tmp/rrdexecute.error.$errortime","w");
fwrite($errorfile,$command);
fwrite($errorfile,$stdout);
fclose($errorfile);
break;
}
I also like your approach so I'll switch to that and see if it gains me any more insight on this.
I also tried to redirect stderr for dsstats_rrdtool_init like this:
if ($config['cacti_server_os'] == 'unix') {
$fds = array(
0 => array('pipe', 'r'), // stdin
1 => array('pipe', 'w'), // stdout
#2 => array('file', '/dev/null', 'a') // stderr
2 => array('file', '/tmp/errors', 'a') // stderr
);
} else {
$fds = array(
0 => array('pipe', 'r'), // stdin
1 => array('pipe', 'w'), // stdout
#2 => array('file', 'nul', 'a') // stderr
2 => array('file', '/tmp/errors.1', 'a') // stderr
);
}
But I didn't really have any success. /tmp/errors was created, but never got anything. Maybe there was never any stderr output. Any idea what else I could do to capture stderr? I was going to suggest that this should really be going to log/cacti_stderr.log.
You could just open it as a pipe, and keep checking for any return values. Don't forget to set it non-blocking though as per the other pipes.
Well I ran out of file descriptors this morning, I had thousands of poller_boost and rrdtool - processes stuck. They were stuck on this:
write(7, "xport --start now-1day --end now"..., 2160
So I rolled back dsstats.php. I'll try to circle back to this over the weekend.
That would suggest we are having issues with the xport pipe command. You shouldn't run out of file descriptors unless you are not freeing up resources at some point?
Agreed, but I think it's been triggered by the debugging additions that we have added to dsstats.php. Not sure if it's a bug with my additions or what. Now that I've rolled back to the clean version of dsstats.php from Beta 2, I'm not accumulating these processes anymore. I'm going to merge in your debugging code and see what happens now.
OK, I had your code running for a little while and got a LOT of output files. Now I'm trying to sort through them. But I am seeing some problems for sure.
Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: info /cacti/cacti-1.2.0-beta2-prod/rra/device_phaseapparentpower1_54076.rrd
Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: info /cacti/cacti-1.2.0-beta2-prod/rra/upsc_upsoutputcurrent2_43594.rrd
Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: xport --start now-1day --end now DEF:aa="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_in:AVERAGE DEF:ab="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_in:MAX DEF:ac="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_out:AVERAGE DEF:ad="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_out:MAX XPORT:aa XPORT:ab XPORT:ac XPORT:ad --maxrows 10000
Reason: ERROR
Output:
ERROR: No DS called 'cpu_nice' in '/cacti/cacti-1.2.0-beta2-prod/rra/abc_phaseactivepower1_48862.rrd'
Command: xport --start now-1day --end now DEF:aa="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_free:AVERAGE DEF:ab="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_free:MAX DEF:ac="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_used:AVERAGE DEF:ad="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_used:MAX XPORT:aa XPORT:ab XPORT:ac XPORT:ad --maxrows 10000
The errors coming back don't sync up with the command being issued at all. And the data sources being asked for have no relevancy to the RRD files that are being queried.
I'm guessing you changed the order slightly of the output. But I would agree that you appear to be getting out of sync between commands. I would look for the first file with ERROR and then check out the previous command to see if it was correct.
If you are using the python rrdtool wrapper, please let us know. I suspect it does not work in pipe mode.
python rrdtool wrapper? I have no idea what that would even be ;) Or why you'd want that! So, yeah I am pretty sure I'm calling rrdtool directly. Unless there's something else out there getting mixed in and I'm not aware of it.
Let me know if there's something I should check to make sure.
The python script is something I’ve only just heard of but was a script a user wrote that allowed multiple RRD tool updates to run concurrently by cycling through twenty of them, thus speeding up updates especially to remote NFS drives. Since you aren’t using that it ruled out a possible theory that it could be the reason for the misalignment of commands.
Any updates?
Closed due to no feedback.
I upgraded to 1.2.0 Beta 3, still getting this error:
2018/11/25 16:43:19 - ERROR PHP NOTICE: Undefined offset: 1 in file: /cacti/cacti-1.2.0-beta3-prod/lib/dsstats.php on line: 203
2018/11/25 16:43:19 - CMDPHP PHP ERROR NOTICE Backtrace: (/poller_boost.php[154]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[203]:CactiErrorHandler())
I deleted the three data sources from the poller_item table that I identified in #2141 to see if that might help this.
I removed the bad data sources from the poller_item table and still experienced the issue. I upgraded to Beta4. Still having the same problem.