Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.64k stars 405 forks source link

RRD Updates can become disabled when saving performance options #4042

Closed bmfmancini closed 3 years ago

bmfmancini commented 3 years ago

Hey Guys

Testing 1.2.16 I found that after I changed the Maximum records field in the performance tab graphs from the remote poller stopped

Did the following troubleshooting

bmfmancini commented 3 years ago

Was able to re-produce this again changed max records and graphs broke again Poller sync fixes the issue again

TheWitness commented 3 years ago

Need more specific info @bmfmancini. Screen shot?

TheWitness commented 3 years ago

Of status page under utilities.

bmfmancini commented 3 years ago

Changed max rows from 5k to 10k

image

Each time I edit the max rows field the graphs stop working until poller sync

image

bmfmancini commented 3 years ago

@TheWitness this is what my boost status page looks like

image

this is a small lab instance so not a whole lot of devices when the issue is happening boost appears to run fine no errors or anything

TheWitness commented 3 years ago

Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.

bmfmancini commented 3 years ago

Let me check on the max packet

On Thu., Jan. 7, 2021, 17:45 TheWitness, notifications@github.com wrote:

Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756433117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTD4SOPIREL4IHYTI7DSYY2SNANCNFSM4VZMI7JQ .

bmfmancini commented 3 years ago

This happens right after you make a change even before boost runs if that helps

On Thu., Jan. 7, 2021, 17:53 Sean Mancini, sean@seanmancini.com wrote:

Let me check on the max packet

On Thu., Jan. 7, 2021, 17:45 TheWitness, notifications@github.com wrote:

Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756433117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTD4SOPIREL4IHYTI7DSYY2SNANCNFSM4VZMI7JQ .

bmfmancini commented 3 years ago

Main poller

image

Remote

image

TheWitness commented 3 years ago

Since it the lab, set it large, put poller_boost.php into debug and then run

php -q poller_boost.php --force

Then share the log to the developers email.

bmfmancini commented 3 years ago

Sure

I am building another lab just to be sure As well to reproduce it on another system

On Thu., Jan. 7, 2021, 18:48 TheWitness, notifications@github.com wrote:

Since it the lab, set it large, put poller_boost.php into debug and then run

php -q poller_boost.php --force

Then share the log to the developers email.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756455711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTBZXANAIMNDS2FSYCLSYZB5LANCNFSM4VZMI7JQ .

TheWitness commented 3 years ago

Well, I had to remind myself how this thing works. That number should be in this millions, and the argument length about 20k. Then at 20k data sources, assuming a poller interval of 5 minutes. Let's say you want to flush the cache every hour, the max records should be about 12 * 20k. Flushing will only happen once an hour, and it'll go quickly.

TheWitness commented 3 years ago

Screenshot_20210107-185944

Here is the setting on my little sanbox of about 70 hosts. Remote poller has ZERO CORRELATION to those settings. So, you are confused.

bmfmancini commented 3 years ago

ok im confused here

my original setting was 5k I simply changed the setting which kills graphing it happens even before boost runs

why would the fix be to sync the pollers then ? I find that any change to the max records causes this issue so I find that weird as on 1.2.12 I tested this with no issue so i'm just confused

On Thu, Jan 7, 2021 at 7:05 PM TheWitness notifications@github.com wrote:

[image: Screenshot_20210107-185944] https://user-images.githubusercontent.com/1439914/103958476-17c9cc80-511b-11eb-87de-31a684224e4a.png

Here is the setting on my little sanbos of about 70 hosts. Remote poller had ZERO CORRELATION to those settings. So, you are confused.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756461675, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTBT2MPUOFW2KCZKHWDSYZD45ANCNFSM4VZMI7JQ .

-- Thank you

Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com

“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”

– Kevin Mitnick

TheWitness commented 3 years ago

Well, likely you web server does not have write access to the rrdfiles.

bmfmancini commented 3 years ago

I wll double check

On Thu, Jan 7, 2021 at 7:23 PM TheWitness notifications@github.com wrote:

Well, likely you web server does not have write access to the rrdfiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756468230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTFZT6S6M4JD47BXTGTSYZF67ANCNFSM4VZMI7JQ .

-- Thank you

Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com

“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”

– Kevin Mitnick

bmfmancini commented 3 years ago

oh wait that doesn't make sense the graphs work fine after the sync is done and the data is updating hmmm..... i will check anyways though

On Thu, Jan 7, 2021 at 7:24 PM Sean Mancini sean@seanmancini.com wrote:

I wll double check

On Thu, Jan 7, 2021 at 7:23 PM TheWitness notifications@github.com wrote:

Well, likely you web server does not have write access to the rrdfiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756468230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTFZT6S6M4JD47BXTGTSYZF67ANCNFSM4VZMI7JQ .

-- Thank you

Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com

“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”

– Kevin Mitnick

-- Thank you

Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com

“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”

– Kevin Mitnick

TheWitness commented 3 years ago

Basically boost was running all the time before, wearing out your disks. Don't blush. Now I have to rethink the readme.

bmfmancini commented 3 years ago

Interesting thanks Ill put it to the default 1000000 and try again

TheWitness commented 3 years ago

Fix your permissions first of course.

bmfmancini commented 3 years ago

I just checked my permissions they are good

Also the graphs are working properly right now I would expect if there was something up with permissions they would not work at all

image

bmfmancini commented 3 years ago

Ok I made the change to the max rows

image

And you were totally right boost was constantly running !

bmfmancini commented 3 years ago

The same thing happened ..... I edited to 1M records boost is in IDLE state right now graphs stopped plotting

bmfmancini commented 3 years ago

Here was the last run

image

Graphs started working again after poller sync

TheWitness commented 3 years ago

Got to be permissions or SELinux.

TheWitness commented 3 years ago

You might want to check your audit.log.

bmfmancini commented 3 years ago

Checked SELinux Permissions good

image

I will check the audit.log but what permissions would affect the graphs only for that change and fix after sync ?

bmfmancini commented 3 years ago

audit.log doesnt show anything in the way of failures Also I just remembered this only happens for devices on the remote poller the main poller devices are always good

TheWitness commented 3 years ago

Is the timezone the same. Focus on the data, the timestamp of the data for those local data IDs.

bmfmancini commented 3 years ago

Yeah I thought about that too but they are all the same timezone the timestamps on the rra also all match

TheWitness commented 3 years ago

I'm at a loss for words. Something is jacked. Show a swipe of ls -l in the rra folder and ps -ef | grep httpd or apache if debian.

TheWitness commented 3 years ago

This might even be an apache rule that blocks apache from writing. I just don't know.

bmfmancini commented 3 years ago

I setup another lab last night I am going to test this there as well

I'll let you know how it goes

On Fri., Jan. 8, 2021, 06:43 TheWitness, notifications@github.com wrote:

This might even be an apache rule that blocks apache from writing. I just don't know.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756712730, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTF6GPSILVWJGG2NOP3SY3VXTANCNFSM4VZMI7JQ .

TheWitness commented 3 years ago

No need to send the output, it was already there. File ownership was right. So, I'm left scratching my head.

bmfmancini commented 3 years ago

ok new lab same issue

Change setting in the performance tab the device is on the remote poller

image

Spine is running just fine

2021-01-08  09:24:04 - SYSTEM STATS: Time:3.0605 Method:spine Processes:10  Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
--
2021-01-08 09:23:05 - SYSTEM STATS: Time:3.1040 Method:spine  Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53  RRDsProcessed:0
2021-01-08 09:22:05 - SYSTEM STATS: Time:3.0485 Method:spine  Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53  RRDsProcessed:0
2021-01-08 09:21:05 - SYSTEM STATS: Time:3.0535 Method:spine  Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53  RRDsProcessed:0
2021-01-08 09:20:05 - SYSTEM STATS: Time:3.0702 Method:spine  Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53  RRDsProcessed:0

Device on Main poller running just fine

image

Permissions

-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_53_ping_39.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_54_ping_38.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_55_ping_37.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:00 1_1_1_56_ping_36.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_57_ping_35.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_58_ping_34.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_59_ping_33.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_60_ping_32.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_61_ping_31.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:24 1_1_1_62_ping_30.rrd
-rwxrwxr-x  1 www-data www-data  170 Nov 30 13:10 .htaccess
-rw-r--r--  1 www-data www-data 1.9M Jan  8 09:27 local_linux_machine_load_1min_                                                                                                             2.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:27 local_linux_machine_mem_buffer                                                                                                             s_4.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:27 local_linux_machine_mem_swap_5                                                                                                             .rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:27 local_linux_machine_proc_1.rrd
-rw-r--r--  1 www-data www-data 643K Jan  8 09:27 local_linux_machine_users_3.rr                                                                                                             d
sean@master1:/var/www/html/cacti/rra$

Apache is running as www-data

www-data 29869  0.0  2.8 231172 28532 ?        S    Jan07   0:01 /usr/sbin/apache2 -k start
www-data 30020  0.0  2.8 231084 28808 ?        S    Jan07   0:01 /usr/sbin/apache2 -k start
www-data 30084  0.0  2.8 230980 28672 ?        S    Jan07   0:01 /usr/sbin/apache2 -k start
www-data 30206  0.0  2.6 230868 26932 ?        S    09:00   0:00 /usr/sbin/apache2 -k start
www-data 30499  0.0  2.7 230916 28104 ?        S    Jan07   0:01 /usr/sbin/apache2 -k start

Boost status page

image

bmfmancini commented 3 years ago

Hey @TheWitness thanks for working with me on this here are the outputs

Before making a change in the performance tab

Below is on the remote poller

MariaDB [cacti]> select * from settings where name = 'boost_rrd_update_enable';
+-------------------------+-------+
| name                    | value |
+-------------------------+-------+
| boost_rrd_update_enable | on    |
+-------------------------+-------+
1 row in set (0.002 sec)

After making a change

MariaDB [cacti]> select * from settings where name = 'boost_rrd_update_enable';
+-------------------------+-------+
| name                    | value |
+-------------------------+-------+
| boost_rrd_update_enable |       |
+-------------------------+-------+
1 row in set (0.000 sec

On the main poller, the value remains on

Issue seems to be isolated to settings.php

TheWitness commented 3 years ago

All fixed.