Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.6k stars 398 forks source link

[1.2.0 Beta 1] dsstats errors #2095

Closed eschoeller closed 5 years ago

eschoeller commented 5 years ago
2018/10/20 16:43:08 - ERROR PHP NOTICE: Undefined offset: 18 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 299
2018/10/20 16:43:08 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[299]:CactiErrorHandler())
2018/10/20 16:43:08 - ERROR PHP NOTICE: Undefined offset: 19 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 313
2018/10/20 16:43:08 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[313]:CactiErrorHandler())
2018/10/20 16:43:08 - ERROR PHP NOTICE: Undefined offset: 19 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 314
2018/10/20 16:43:08 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[314]:CactiErrorHandler())
2018/10/20 16:43:08 - ERROR PHP NOTICE: Undefined offset: 16 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 298
2018/10/20 16:43:08 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[298]:CactiErrorHandler())
2018/10/20 16:43:18 - ERROR PHP NOTICE: Undefined offset: 1 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 203
2018/10/20 16:43:18 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_dsstats.php[202]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[203]:CactiErrorHandler())
cigamit commented 5 years ago

Well that's quite odd. For the RRD in question, we need to know how many data sources. Not easy to instrument. I'm making an update in the mean time.

cigamit commented 5 years ago

Test again with this latest update.

eschoeller commented 5 years ago

OK, I've got that in there now. I'll let you know if these errors pop back up. Chasing down a bunch of things right now.

eschoeller commented 5 years ago
2018/10/20 19:15:43 - ERROR PHP NOTICE: Undefined offset: 1 in file: /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php  on line: 203
2018/10/20 19:15:43 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[153]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[203]:CactiErrorHandler())
cigamit commented 5 years ago

What rrdtool version are you using? The values from the export are not matching the number of data sources or the format of the XML file changed.

cigamit commented 5 years ago

You are going to have to instrument some code to trace down the RRDfile in question. Once you know the file that is causing these issues, run the 'rrdtool info' command on that file and post the output.

eschoeller commented 5 years ago

The main data collector is using rrdtool 1.7.0. I'll see what I can do to dig into this some more.

eschoeller commented 5 years ago

OK, I've traced this back to line 203 of dsstats.php. It's trying explode() on a '=' but there are times when it's getting a $line back that looks just like this: <step>60</step> Instead of what it really wants which is this: step = 60 So, this is kinda fascinating, that there would be an irregularity here. Maybe I have some old RRD files?

cigamit commented 5 years ago

The info command outputs the step as step = seconds. The dump command exports it as XML, but this command does not leverage the step information from the dump, but from the info call. So, keep searching.

netniV commented 5 years ago

If you run rrdtool - and then type info <path to rrd file> you will see output similar to:

info test_users_8.rrd
filename = "test_users_8.rrd"
rrd_version = "0003"
step = 60
last_update = 1534862402
header_size = 3704
ds[users].index = 0
ds[users].type = "GAUGE"
ds[users].minimal_heartbeat = 600
ds[users].min = 0.0000000000e+00
ds[users].max = 5.0000000000e+02
ds[users].last_ds = "4"
ds[users].value = 8.0000000000e+00
ds[users].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 2900
rra[0].cur_row = 659
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0

If you believe you are getting an error with a specific file, output the name of the file when you get this mismatch and then use it in the above command.

eschoeller commented 5 years ago

Yep that’s where I’m headed. Trying to determine which rrd files are causing this. Made a change last night to log those, will review results today.

Eric.

On Oct 23, 2018, at 9:49 AM, Mark Brugnoli-Vinten notifications@github.com wrote:

If you run rrdtool - and then type info you will see output similar to:

info test_users_8.rrd filename = "test_users_8.rrd" rrd_version = "0003" step = 60 last_update = 1534862402 header_size = 3704 ds[users].index = 0 ds[users].type = "GAUGE" ds[users].minimal_heartbeat = 600 ds[users].min = 0.0000000000e+00 ds[users].max = 5.0000000000e+02 ds[users].last_ds = "4" ds[users].value = 8.0000000000e+00 ds[users].unknown_sec = 0 rra[0].cf = "AVERAGE" rra[0].rows = 2900 rra[0].cur_row = 659 rra[0].pdp_per_row = 1 rra[0].xff = 5.0000000000e-01 rra[0].cdp_prep[0].value = NaN rra[0].cdp_prep[0].unknown_datapoints = 0 If you believe you are getting an error with a specific file, output the name of the file when you get this mismatch and then use it in the above command.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

netniV commented 5 years ago

Did you manage to find any further information?

eschoeller commented 5 years ago

I am stumped right now. I found 5,274 RRD files (I have 23,073 data sources) affected by this issue. But it seems to be inconsistent. For 2,796 RRD files it's only happened once since I started collecting this data. For the rest it's happened between 2-139 times. I've executed an rrdtool info on all those affected files and they all output in the "step = 60" format and all have rrd_version 0003. So how on earth the code is getting XML (sometimes) is a bit beyond me right now. I'm looking a bit higher up in this section of code and notice there's two different methods to calling rrdtool info, one via proxy the other not. I'm wondering if that's somehow causing an issue. It's the only place I can see a difference in the info output happening, but it still doesn't seem logical to me.

netniV commented 5 years ago

From what I read the other day, as I'm not in front of the code right now, I would say that the proxy is either on or off. You can't have it on for a single data source for example.

eschoeller commented 5 years ago

Yep I would agree. I can’t see the proxy being used intermittently. But I’m going to add some more debug output to verify that.

My default system rrdtool is much older, version 1.4 I believe, and I have version 1.7 installed in /usr/local. Cacti is configured to use the /usr/local rrdtool. But this is where I might start poking around next, even though I don’t see it having any influence here. It’s something I need to “clean up” anyway. There’s some reason I’m configured this way (which I can’t recall) and I know there is some other lurking issue waiting to bite me when I make this switch.

I’m also going to craft up some code to dump the entire contents of the ‘rrdtool info’ output into temporary files so I can look a bit further into what Cacti is getting back.

Eric.

On Oct 25, 2018, at 2:04 AM, Mark Brugnoli-Vinten notifications@github.com wrote:

From what I read the other day, as I'm not in front of the code right now, I would say that the proxy is either on or off. You can't have it on for a single data source for example.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

eschoeller commented 5 years ago

I am also noticing that I am getting this output in the log file in every poller run: SYSTEM DSSTATS STATS: Type:HOURLY, Time:7.2996 And I'm getting this output roughly every 15 minutes: SYSTEM DSSTATS STATS: Type:DAILY, Time:115.5919 And this is my data source statistics configuration: image Does this seem right to you?

eschoeller commented 5 years ago

I took a peek into boost.log today. I found a lot more undefined offset problems:

grep Undefined boost.log | sort | uniq -c
  24922 Notice: Undefined offset: 1 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 203
 112396 Notice: Undefined offset: 10 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
 112396 Notice: Undefined offset: 10 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
 112396 Notice: Undefined offset: 11 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
 112396 Notice: Undefined offset: 11 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  29427 Notice: Undefined offset: 12 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  29427 Notice: Undefined offset: 12 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  29427 Notice: Undefined offset: 13 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  29427 Notice: Undefined offset: 13 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  29427 Notice: Undefined offset: 14 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  29427 Notice: Undefined offset: 14 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  29427 Notice: Undefined offset: 15 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  29427 Notice: Undefined offset: 15 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  43200 Notice: Undefined offset: 16 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  43200 Notice: Undefined offset: 16 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  43200 Notice: Undefined offset: 17 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  43200 Notice: Undefined offset: 17 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  38687 Notice: Undefined offset: 18 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  38687 Notice: Undefined offset: 18 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  38687 Notice: Undefined offset: 19 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  38687 Notice: Undefined offset: 19 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  35879 Notice: Undefined offset: 2 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  35879 Notice: Undefined offset: 2 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  58724 Notice: Undefined offset: 20 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  58724 Notice: Undefined offset: 20 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  58724 Notice: Undefined offset: 21 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  58724 Notice: Undefined offset: 21 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  44028 Notice: Undefined offset: 22 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  44028 Notice: Undefined offset: 22 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  44028 Notice: Undefined offset: 23 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  44028 Notice: Undefined offset: 23 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  44028 Notice: Undefined offset: 24 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  44028 Notice: Undefined offset: 24 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  44028 Notice: Undefined offset: 25 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  44028 Notice: Undefined offset: 25 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  44028 Notice: Undefined offset: 26 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  44028 Notice: Undefined offset: 26 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  44028 Notice: Undefined offset: 27 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  44028 Notice: Undefined offset: 27 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  44028 Notice: Undefined offset: 28 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  44028 Notice: Undefined offset: 28 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  44028 Notice: Undefined offset: 29 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  44028 Notice: Undefined offset: 29 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  35879 Notice: Undefined offset: 3 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  35879 Notice: Undefined offset: 3 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  42595 Notice: Undefined offset: 30 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  42595 Notice: Undefined offset: 30 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  42595 Notice: Undefined offset: 31 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  42595 Notice: Undefined offset: 31 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 32 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 32 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 33 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 33 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 34 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 34 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 35 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 35 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 36 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 36 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 37 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 37 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 38 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 38 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 39 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 39 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  70583 Notice: Undefined offset: 4 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  70583 Notice: Undefined offset: 4 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 40 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 40 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 41 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 41 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 42 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 42 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 43 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 43 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 44 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 44 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 45 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 45 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 46 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 46 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 47 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 47 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  30123 Notice: Undefined offset: 48 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
  30123 Notice: Undefined offset: 48 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
  30123 Notice: Undefined offset: 49 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  30123 Notice: Undefined offset: 49 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
  70583 Notice: Undefined offset: 5 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
  70583 Notice: Undefined offset: 5 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
 145922 Notice: Undefined offset: 6 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
 145922 Notice: Undefined offset: 6 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
 145922 Notice: Undefined offset: 7 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
 145922 Notice: Undefined offset: 7 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320
 113658 Notice: Undefined offset: 8 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 304
 113658 Notice: Undefined offset: 8 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 305
 113658 Notice: Undefined offset: 9 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 319
 113658 Notice: Undefined offset: 9 in /cacti/cacti-1.2.0-beta1-prod/lib/dsstats.php on line 320

I'll keep digging.

eschoeller commented 5 years ago

Hmm. Interesting development. I have started writing the contents of $info defined in lib/dsstats.php on line 174 to temporary files. After the very first run it turns out I have some files that are actually 'xport' files and not 'info' files at all. Take a look:

<?xml version="1.0" encoding="ISO-8859-1"?>

<xport>
  <meta>
    <start>1540701180</start>
    <end>1540787580</end>
    <step>60</step>
    <rows>1440</rows>
    <columns>12</columns>
    <legend>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
      <entry></entry>
    </legend>
  </meta>
  <data>
    <row><v>4.8000000000e+01</v><v>4.8000000000e+01</v><v>6.9383333333e+01</v><v>6.9383333333e+01</v><v>2.3000000000e+01</v><v>2.3000000000e+01</v><v>7.0000000000e+01</v><v>7.0000000000e+01</v><v>7.1000000000e+01</v><v>7.1000000000e+01</v><v>1.1600000000e+02</v><v>1.1600000000e+02</v></row>
eschoeller commented 5 years ago

I am getting a lot of errors coming back from dsstats_rrdtool_execute() namely these two:

ERROR: can't make an xport without contents
ERROR: creating arguments
ERROR: No DS called 'foo' in '/cacti/rra/bar.rrd'

But there are an overwhelmingly large number of the can't make an xport without contents error.

eschoeller commented 5 years ago

I think something is going wrong with the rrdtool pipe. I am getting back all sorts of irregular, inconsistent data from the rrdtool commands being issued. I changed the pipes to be blocking and this had a dramatic reduction in the number of errors I'm seeing.

netniV commented 5 years ago

So, the real question is how are you ending up with xport over info?

netniV commented 5 years ago

Can you run the following query for me?

SELECT name,value FROM settings where name = 'storage_location';
eschoeller commented 5 years ago

here it is:

mysql> SELECT name,value FROM settings where name = 'storage_location';
+------------------+-------+
| name             | value |
+------------------+-------+
| storage_location | 0     |
+------------------+-------+
1 row in set (0.00 sec)
eschoeller commented 5 years ago

Some of the data I'm getting back is very incomplete too. I'm writing all the data Cacti is receiving back out to temporary files and I consistently see problems with about 15 data sources. For example, this is all the output stored in $info for a standard Unix - Ping Host data source:

ds[ipFragOKs].value = 0.0000000000e+00

That's just one line of info output from some other data source based upon the RFC1213 template.

Here's another odd example. In this case the file being queried is "abc_ping_3637.rrd" but I'm getting back data from "abc_ipreasmtimeout_3632.rrd".

filename = "/cacti/cacti-1.2.0-beta2-prod/rra/abc_ipreasmtimeout_3632.rrd"
rrd_version = "0003"
step = 60
last_update = 1540803244
header_size = 17936
ds[ipReasmTimeout].index = 0
ds[ipReasmTimeout].type = "COUNTER"
ds[ipReasmTimeout].minimal_heartbeat = 120
ds[ipReasmTimeout].min = 0.0000000000e+00
ds[ipReasmTimeout].max = 4.2949672950e+09
ds[ipReasmTimeout].last_ds = "30"
ds[ipReasmTimeout].value = 0.0000000000e+00
ds[ipReasmTimeout].unknown_sec = 0
ds[ipReasmReqds].index = 1
ds[ipReasmReqds].type = "COUNTER"
ds[ipReasmReqds].minimal_heartbeat = 120
ds[ipReasmReqds].min = 0.0000000000e+00
ds[ipReasmReqds].max = 4.2949672950e+09
ds[ipReasmReqds].last_ds = "24504"
ds[ipReasmReqds].value = 0.0000000000e+00
ds[ipReasmReqds].unknown_sec = 0
ds[ipFragOKs].index = 2

And I don't even think that's a complete set of data. So, I am getting sufficiently confused. It's no doubt that I must have some strange lurking data source out there tripping things up, but the code should be able to throw an appropriate exception and handle it.

netniV commented 5 years ago

I would agree. Not a complete set of data.

netniV commented 5 years ago

That means it’s not using proxy but is using local pipes.

eschoeller commented 5 years ago

Yes I’ve confirmed I never hit the proxy code. I’m wondering if this is a problem with the pipe in some way, and what’s the best way to troubleshoot and further diagnose this. Maybe the info and xport should use a different pipe?

netniV commented 5 years ago

Sounds like a plan. Or, maybe just check after an info request, you haven’t had XML. If you do, you know the previous command must of either overflowed the pipe in some way or made it believe it was complete when it wasn’t.

eschoeller commented 5 years ago

There are hundreds of these transactions happening so it's difficult to correlate. Can you give me a hint on how I might trigger the dsstats code to run more frequently? Perhaps on-demand from the command-line? I've been spending a good amount of time waiting around for my boost process to run so the dsstats can run directly after that. I don't think it really hurts to run it often, just so long as I don't have multiple ones running at the same time.

netniV commented 5 years ago

Let me review the code again and check some docs. It sounds most definitely like a pipe issue to me.

netniV commented 5 years ago

So, what I would suggest is that in the dssstats_rrdtool_execute(), you record the command and output into a file. I'm not sure what debugging you have in place at this point, so I'm going to start with my own:

diff --git a/lib/dsstats.php b/lib/dsstats.php
index 4e60d6ab..de76123a 100644
--- a/lib/dsstats.php
+++ b/lib/dsstats.php
@@ -817,6 +817,7 @@ function dsstats_rrdtool_init() {
    @arg $pipes - (array) An array of stdin and stdout pipes to read and write data from
    @returns - (string) The output from RRDtool */
 function dsstats_rrdtool_execute($command, $pipes) {
+       global $config;
        $stdout = '';

        if ($command == '') return;
@@ -824,18 +825,23 @@ function dsstats_rrdtool_execute($command, $pipes) {
        $command .= "\r\n";
        $return_code = fwrite($pipes[0], $command);

+       $return_reason = 'EOF';
        while (!feof($pipes[1])) {
                $stdout .= fgets($pipes[1], 4096);

                if (substr_count($stdout, 'OK')) {
+                       $return_reason = 'OK';
                        break;
                }

                if (substr_count($stdout, 'ERROR')) {
+                       $return_reason = 'ERROR';
                        break;
                }
        }

+       $temp = tempnam($config['base_path'] . '/log', 'dsstats');
+       file_put_contents($temp, "Command: $command\nReason: $return_reason\nOutput:\n$stdout");
        if (strlen($stdout)) return $stdout;
 }
netniV commented 5 years ago

I have also noted that this process doesn't open stderr and check for any potential errors that way either.

eschoeller commented 5 years ago

Yep, I have done something similar within dsstats_rrdtool_execute already:

if (substr_count($stdout, 'ERROR')) {
                        cacti_log("rrdtool_execute found ERROR for command:" . $command, true,'DSSTATS');
                        $errortime = microtime();
                        $errorfile = fopen("/tmp/rrdexecute.error.$errortime","w");
                        fwrite($errorfile,$command);
                        fwrite($errorfile,$stdout);
                        fclose($errorfile);
                        break;
                }

I also like your approach so I'll switch to that and see if it gains me any more insight on this.

I also tried to redirect stderr for dsstats_rrdtool_init like this:

if ($config['cacti_server_os'] == 'unix') {
                $fds = array(
                        0 => array('pipe', 'r'), // stdin
                        1 => array('pipe', 'w'), // stdout
                        #2 => array('file', '/dev/null', 'a')  // stderr
                        2 => array('file', '/tmp/errors', 'a')  // stderr
                );
        } else {
                $fds = array(
                        0 => array('pipe', 'r'), // stdin
                        1 => array('pipe', 'w'), // stdout
                        #2 => array('file', 'nul', 'a')  // stderr
                        2 => array('file', '/tmp/errors.1', 'a')  // stderr
                );
        }

But I didn't really have any success. /tmp/errors was created, but never got anything. Maybe there was never any stderr output. Any idea what else I could do to capture stderr? I was going to suggest that this should really be going to log/cacti_stderr.log.

netniV commented 5 years ago

You could just open it as a pipe, and keep checking for any return values. Don't forget to set it non-blocking though as per the other pipes.

eschoeller commented 5 years ago

Well I ran out of file descriptors this morning, I had thousands of poller_boost and rrdtool - processes stuck. They were stuck on this:

write(7, "xport --start now-1day --end now"..., 2160

So I rolled back dsstats.php. I'll try to circle back to this over the weekend.

netniV commented 5 years ago

That would suggest we are having issues with the xport pipe command. You shouldn't run out of file descriptors unless you are not freeing up resources at some point?

eschoeller commented 5 years ago

Agreed, but I think it's been triggered by the debugging additions that we have added to dsstats.php. Not sure if it's a bug with my additions or what. Now that I've rolled back to the clean version of dsstats.php from Beta 2, I'm not accumulating these processes anymore. I'm going to merge in your debugging code and see what happens now.

eschoeller commented 5 years ago

OK, I had your code running for a little while and got a LOT of output files. Now I'm trying to sort through them. But I am seeing some problems for sure.

Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: info /cacti/cacti-1.2.0-beta2-prod/rra/device_phaseapparentpower1_54076.rrd
Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: info /cacti/cacti-1.2.0-beta2-prod/rra/upsc_upsoutputcurrent2_43594.rrd
Reason: ERROR
Output:
ERROR: can't make an xport without contents
Command: xport --start now-1day --end now DEF:aa="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_in:AVERAGE DEF:ab="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_in:MAX DEF:ac="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_out:AVERAGE DEF:ad="/cacti/cacti-1.2.0-beta2-prod/rra/6_snmpinpkts_8585.rrd":unicast_out:MAX XPORT:aa XPORT:ab XPORT:ac XPORT:ad --maxrows 10000
Reason: ERROR
Output:
ERROR: No DS called 'cpu_nice' in '/cacti/cacti-1.2.0-beta2-prod/rra/abc_phaseactivepower1_48862.rrd'
Command: xport --start now-1day --end now DEF:aa="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_free:AVERAGE DEF:ab="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_free:MAX DEF:ac="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_used:AVERAGE DEF:ad="/cacti/cacti-1.2.0-beta2-prod/rra/5_ipreasmtimeout_8557.rrd":hdd_used:MAX XPORT:aa XPORT:ab XPORT:ac XPORT:ad --maxrows 10000

The errors coming back don't sync up with the command being issued at all. And the data sources being asked for have no relevancy to the RRD files that are being queried.

netniV commented 5 years ago

I'm guessing you changed the order slightly of the output. But I would agree that you appear to be getting out of sync between commands. I would look for the first file with ERROR and then check out the previous command to see if it was correct.

cigamit commented 5 years ago

If you are using the python rrdtool wrapper, please let us know. I suspect it does not work in pipe mode.

eschoeller commented 5 years ago

python rrdtool wrapper? I have no idea what that would even be ;) Or why you'd want that! So, yeah I am pretty sure I'm calling rrdtool directly. Unless there's something else out there getting mixed in and I'm not aware of it. image Let me know if there's something I should check to make sure.

netniV commented 5 years ago

The python script is something I’ve only just heard of but was a script a user wrote that allowed multiple RRD tool updates to run concurrently by cycling through twenty of them, thus speeding up updates especially to remote NFS drives. Since you aren’t using that it ruled out a possible theory that it could be the reason for the misalignment of commands.

cigamit commented 5 years ago

Any updates?

cigamit commented 5 years ago

Closed due to no feedback.

eschoeller commented 5 years ago

I upgraded to 1.2.0 Beta 3, still getting this error:

2018/11/25 16:43:19 - ERROR PHP NOTICE: Undefined offset: 1 in file: /cacti/cacti-1.2.0-beta3-prod/lib/dsstats.php  on line: 203
2018/11/25 16:43:19 - CMDPHP PHP ERROR NOTICE Backtrace:  (/poller_boost.php[154]:dsstats_boost_bottom(), /lib/dsstats.php[727]:dsstats_get_and_store_ds_avgpeak_values(), /lib/dsstats.php[71]:dsstats_obtain_data_source_avgpeak_values(), /lib/dsstats.php[203]:CactiErrorHandler())

I deleted the three data sources from the poller_item table that I identified in #2141 to see if that might help this.

eschoeller commented 5 years ago

I removed the bad data sources from the poller_item table and still experienced the issue. I upgraded to Beta4. Still having the same problem.