Closed bmfmancini closed 3 years ago
Was able to re-produce this again changed max records and graphs broke again Poller sync fixes the issue again
Need more specific info @bmfmancini. Screen shot?
Of status page under utilities.
Changed max rows from 5k to 10k
Each time I edit the max rows field the graphs stop working until poller sync
@TheWitness this is what my boost status page looks like
this is a small lab instance so not a whole lot of devices when the issue is happening boost appears to run fine no errors or anything
Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.
Let me check on the max packet
On Thu., Jan. 7, 2021, 17:45 TheWitness, notifications@github.com wrote:
Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756433117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTD4SOPIREL4IHYTI7DSYY2SNANCNFSM4VZMI7JQ .
This happens right after you make a change even before boost runs if that helps
On Thu., Jan. 7, 2021, 17:53 Sean Mancini, sean@seanmancini.com wrote:
Let me check on the max packet
On Thu., Jan. 7, 2021, 17:45 TheWitness, notifications@github.com wrote:
Well, make your argument length longer. What is your clients max allowed packet? Agree, small system.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756433117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTD4SOPIREL4IHYTI7DSYY2SNANCNFSM4VZMI7JQ .
Main poller
Remote
Since it the lab, set it large, put poller_boost.php into debug and then run
php -q poller_boost.php --force
Then share the log to the developers email.
Sure
I am building another lab just to be sure As well to reproduce it on another system
On Thu., Jan. 7, 2021, 18:48 TheWitness, notifications@github.com wrote:
Since it the lab, set it large, put poller_boost.php into debug and then run
php -q poller_boost.php --force
Then share the log to the developers email.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756455711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTBZXANAIMNDS2FSYCLSYZB5LANCNFSM4VZMI7JQ .
Well, I had to remind myself how this thing works. That number should be in this millions, and the argument length about 20k. Then at 20k data sources, assuming a poller interval of 5 minutes. Let's say you want to flush the cache every hour, the max records should be about 12 * 20k. Flushing will only happen once an hour, and it'll go quickly.
Here is the setting on my little sanbox of about 70 hosts. Remote poller has ZERO CORRELATION to those settings. So, you are confused.
ok im confused here
my original setting was 5k I simply changed the setting which kills graphing it happens even before boost runs
why would the fix be to sync the pollers then ? I find that any change to the max records causes this issue so I find that weird as on 1.2.12 I tested this with no issue so i'm just confused
On Thu, Jan 7, 2021 at 7:05 PM TheWitness notifications@github.com wrote:
[image: Screenshot_20210107-185944] https://user-images.githubusercontent.com/1439914/103958476-17c9cc80-511b-11eb-87de-31a684224e4a.png
Here is the setting on my little sanbos of about 70 hosts. Remote poller had ZERO CORRELATION to those settings. So, you are confused.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756461675, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTBT2MPUOFW2KCZKHWDSYZD45ANCNFSM4VZMI7JQ .
-- Thank you
Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com
“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”
– Kevin Mitnick
Well, likely you web server does not have write access to the rrdfiles.
I wll double check
On Thu, Jan 7, 2021 at 7:23 PM TheWitness notifications@github.com wrote:
Well, likely you web server does not have write access to the rrdfiles.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756468230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTFZT6S6M4JD47BXTGTSYZF67ANCNFSM4VZMI7JQ .
-- Thank you
Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com
“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”
– Kevin Mitnick
oh wait that doesn't make sense the graphs work fine after the sync is done and the data is updating hmmm..... i will check anyways though
On Thu, Jan 7, 2021 at 7:24 PM Sean Mancini sean@seanmancini.com wrote:
I wll double check
On Thu, Jan 7, 2021 at 7:23 PM TheWitness notifications@github.com wrote:
Well, likely you web server does not have write access to the rrdfiles.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756468230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTFZT6S6M4JD47BXTGTSYZF67ANCNFSM4VZMI7JQ .
-- Thank you
Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com
“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”
– Kevin Mitnick
-- Thank you
Sean Mancini,(Six Sigma LBBIT®, ITIL,CEA-IT®,SCRUM SMPC®) Owner/Principal Engineer www.seanmancini.com
“Companies spend millions of dollars on firewalls, encryption, and secure access devices, and it’s money wasted because none of these measures address the weakest link in the security chain.”
– Kevin Mitnick
Basically boost was running all the time before, wearing out your disks. Don't blush. Now I have to rethink the readme.
Interesting thanks Ill put it to the default 1000000 and try again
Fix your permissions first of course.
I just checked my permissions they are good
Also the graphs are working properly right now I would expect if there was something up with permissions they would not work at all
Ok I made the change to the max rows
And you were totally right boost was constantly running !
The same thing happened ..... I edited to 1M records boost is in IDLE state right now graphs stopped plotting
Here was the last run
Graphs started working again after poller sync
Got to be permissions or SELinux.
You might want to check your audit.log.
Checked SELinux Permissions good
I will check the audit.log but what permissions would affect the graphs only for that change and fix after sync ?
audit.log doesnt show anything in the way of failures Also I just remembered this only happens for devices on the remote poller the main poller devices are always good
Is the timezone the same. Focus on the data, the timestamp of the data for those local data IDs.
Yeah I thought about that too but they are all the same timezone the timestamps on the rra also all match
I'm at a loss for words. Something is jacked. Show a swipe of ls -l
in the rra folder and ps -ef | grep httpd
or apache if debian.
This might even be an apache rule that blocks apache from writing. I just don't know.
I setup another lab last night I am going to test this there as well
I'll let you know how it goes
On Fri., Jan. 8, 2021, 06:43 TheWitness, notifications@github.com wrote:
This might even be an apache rule that blocks apache from writing. I just don't know.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4042#issuecomment-756712730, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTF6GPSILVWJGG2NOP3SY3VXTANCNFSM4VZMI7JQ .
No need to send the output, it was already there. File ownership was right. So, I'm left scratching my head.
ok new lab same issue
Change setting in the performance tab the device is on the remote poller
Spine is running just fine
2021-01-08 09:24:04 - SYSTEM STATS: Time:3.0605 Method:spine Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
--
2021-01-08 09:23:05 - SYSTEM STATS: Time:3.1040 Method:spine Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
2021-01-08 09:22:05 - SYSTEM STATS: Time:3.0485 Method:spine Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
2021-01-08 09:21:05 - SYSTEM STATS: Time:3.0535 Method:spine Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
2021-01-08 09:20:05 - SYSTEM STATS: Time:3.0702 Method:spine Processes:10 Threads:2 Hosts:32 HostsPerProcess:4 DataSources:53 RRDsProcessed:0
Device on Main poller running just fine
Permissions
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_53_ping_39.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_54_ping_38.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_55_ping_37.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:00 1_1_1_56_ping_36.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_57_ping_35.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_58_ping_34.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_59_ping_33.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_60_ping_32.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_61_ping_31.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:24 1_1_1_62_ping_30.rrd
-rwxrwxr-x 1 www-data www-data 170 Nov 30 13:10 .htaccess
-rw-r--r-- 1 www-data www-data 1.9M Jan 8 09:27 local_linux_machine_load_1min_ 2.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:27 local_linux_machine_mem_buffer s_4.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:27 local_linux_machine_mem_swap_5 .rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:27 local_linux_machine_proc_1.rrd
-rw-r--r-- 1 www-data www-data 643K Jan 8 09:27 local_linux_machine_users_3.rr d
sean@master1:/var/www/html/cacti/rra$
Apache is running as www-data
www-data 29869 0.0 2.8 231172 28532 ? S Jan07 0:01 /usr/sbin/apache2 -k start
www-data 30020 0.0 2.8 231084 28808 ? S Jan07 0:01 /usr/sbin/apache2 -k start
www-data 30084 0.0 2.8 230980 28672 ? S Jan07 0:01 /usr/sbin/apache2 -k start
www-data 30206 0.0 2.6 230868 26932 ? S 09:00 0:00 /usr/sbin/apache2 -k start
www-data 30499 0.0 2.7 230916 28104 ? S Jan07 0:01 /usr/sbin/apache2 -k start
Boost status page
Hey @TheWitness thanks for working with me on this here are the outputs
Before making a change in the performance tab
Below is on the remote poller
MariaDB [cacti]> select * from settings where name = 'boost_rrd_update_enable';
+-------------------------+-------+
| name | value |
+-------------------------+-------+
| boost_rrd_update_enable | on |
+-------------------------+-------+
1 row in set (0.002 sec)
After making a change
MariaDB [cacti]> select * from settings where name = 'boost_rrd_update_enable';
+-------------------------+-------+
| name | value |
+-------------------------+-------+
| boost_rrd_update_enable | |
+-------------------------+-------+
1 row in set (0.000 sec
On the main poller, the value remains on
Issue seems to be isolated to settings.php
All fixed.
Hey Guys
Testing 1.2.16 I found that after I changed the Maximum records field in the performance tab graphs from the remote poller stopped
Did the following troubleshooting