Closed sreedhasivankutty closed 2 years ago
Highly suggest moving to 1.2.22 from what I Remeber there were several bug in boost around 18 and 19
Once you get the 1.2.22 update pull down lib boost from the 1.2.x branch
On Mon., Oct. 17, 2022, 7:54 a.m. sreedhasivankutty, < @.***> wrote:
Hi,
We are using Cacti 1.2.18 and facing an issue with Poller, Boost stops working randomly or abruptly. We faced similar issue last Thursday and today also. We have ~4K devices and have multi poller structure. Main Poller has about 25 devices only and all three remote poller has ~1000+ devices. Polling is happing in every 5 min and Boost should run in every 30 min or when 8million count is reached.
We see an issue where boost does not run in its stipulated time and thus poller output_boost and pooler_output table gets filled up with millions of data eventually taking up/tmp space and causing data outage. We also observe many poller processes running (all poller processes in 5 mins interval from issue start time (17:25) and subsequent intervals) which does not get exited (via ps-ef commands)
We have observed this warning before the usual poller.php errors. 2022-10-17 17:25:04 - POLLER: Poller[Main Poller] PID[XXXX] WARNING: Poller Output Table not Empty. Issues: 31, DS[DEVICE1 - DS1 – GRAPH1 - , DEVICE2 - Errors/Discards - X/X/X, 10-Gig Ethernet, XXXXXXXXXXXXX-, DEVICE2
- Traffic - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, Device2- Errors/Discards - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, device2 - Traffic - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, XXXXXXXXXXXXX - Errors/Discards - 5/2/9, 10-Gig Ethernet, XXXXXXXXXXXXX -, DEVICE2 - Traffic - 5/2/9, 10-Gig Ethernet, XXXXXXXXXXXXX -, device2 - Errors/Discards - 5/2/10, 10-Gig Ethernet, XXXXXXXXXXXXX] Graphs[DS2 – GRAPH2, Device - Traffic - Discards – A/B/C , Device2 - Traffic - Discards
- x/x/x , DEVICE2 - Traffic - Discards - 5/2/9 , DEVICE2 - Traffic - Discards - 5/2/10 ]
The usual errors that are seen when there is such an issue 2022-10-17 17:25:17 - ERROR PHP WARNING in Plugin 'weathermap': Division by zero in file: cacti/plugins/weathermap/setup.php on line: 728 2022-10-17 17:25:17 - CMDPHP PHP ERROR WARNING Backtrace: (/poller.php[690]:process_poller_output(), /lib/poller.php[544]:api_plugin_hook_function(), /lib/plugins.php[130]:api_plugin_run_plugin_hook_function(), /lib/plugins.php[237]:weathermap_poller_output(), /plugins/weathermap/setup.php[728]:CactiErrorHandler())
2022-10-17 17:25:58 - ERROR PHP WARNING in Plugin 'weathermap': Division by zero in file: cacti/plugins/weathermap/setup.php on line: 728 2022-10-17 17:25:58 - CMDPHP PHP ERROR WARNING Backtrace: (/poller.php[672]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[544]:api_plugin_hook_function(), /lib/plugins.php[130]:api_plugin_run_plugin_hook_function(), /lib/plugins.php[237]:weathermap_poller_output(), /plugins/weathermap/setup.php[728]:CactiErrorHandler())
Another stats log (between 17:15 and 18:30 process didn't run) 2022-10-17 18:46:26 - SYSTEM STATS: Time:375.7011 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:41:01 - SYSTEM STATS: Time:351.2834 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:35:43 - SYSTEM STATS: Time:340.8211 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:30:32 - SYSTEM STATS: Time:329.0929 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:15:42 - SYSTEM STATS: Time:33.8834 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:11:27 - SYSTEM STATS: Time:78.6570 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:07:02 - SYSTEM STATS: Time:80.1938 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:01:29 - SYSTEM STATS: Time:78.8822 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 16:56:27 - SYSTEM STATS: Time:77.9787 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0
Monitor stats log (nothing between 17:15 - 18:30, same for SYSTEM THOLD POLLER STATS), otherwise the logs are seen every 5 mins 2022-10-17 18:30:43 - SYSTEM MONITOR STATS: Time:0.1018 Reboots:0 DownDevices:0 Notifications:0 Purges:0 2022-10-17 17:15:48 - SYSTEM MONITOR STATS: Time:0.0336 Reboots:0 DownDevices:0 Notifications:0 Purges:0
After 16:50, Boost ran at 18:38 only (missed conditions '> 'number of records and '>' than 30 mins time)
022-10-17 18:38:18 - SYSTEM BOOST STATS: Time:450.46 ProcessNumber:6 RRDUpdates:1812654 2022-10-17 18:37:32 - SYSTEM BOOST STATS: Time:404.17 ProcessNumber:5 RRDUpdates:1880442 2022-10-17 16:50:46 - SYSTEM BOOST STATS: Time:243.47 RRDUpdates:8267334 2022-10-17 16:50:45 - SYSTEM BOOST STATS: Time:236.56 ProcessNumber:3 RRDUpdates:817806
Please find below the system specifications Cacti 1.2.18 Spine 1.2.19 MariaDB 10.3. OS: Linux Browser: Chrome
Could you please help us to debug/fix this issue as we are seeing this in our live environment quite often.
Many Thanks
— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4959, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTDGNGK7PSNNZSWQ6QTWDU5APANCNFSM6AAAAAARG7WPEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your response. We are in running thin in a live environment, so outage is less acceptable, and upgrade has to be planned and takes some time, is there any work around that can be suggested till we do an upgrade 1.22?
Also, can this issue occur due to any particular devices or data sources (because last two times we observed the above warning for same devices and DS) or is this a common issue with respect to boost?
Upgrade is simple. Here is the trick:
cd /var/www/html
git clone -b 1.2.x github.com/cacti/cacti.git cacti-develop
# Don't do these last two steps this week.
cp -rp cacti-develop cacti
chown -R apache:apache cacti
Refresh your browser. This is true unless you are running with a bunch of hacks. This would take you to the 1.2.23, which I don't recommend right now as we are doing some major plumbing this week, but if you download the release tarball, you can effectively remove the log and rra directories from it, and then do the same thing. Then, simply copy the lib/boost.php from the cacti-develop that you downloaded above.
The other option is to take the poller_boost.php
and lib/boost.php
from that git clone, and apply them between boost runs. I'm pretty sure that there are no incompatible calls. Then you can force run boost to ensure it's running.
Thank you for response, Will it work if we just put poller_boost.php and lib/boost.php from that develop branch to 1.2.18? or is it required that we upgrade to 1.2.22/ 1.2.23 and then put the latest boost files
Also, could you please share any similar issue numbers which are seen in this version and resolved via upgrade. This is for the decision making from management perspective.
Just search for boost in open and closed issue lists, as there is one open ATM. Boost went through major redesign as my system has in excess of 2m data sources. There was one new regression introduced in 1.2.22 which is why lib/boost.php is important from 1.2.x branch (and NOT develop). But at 1.2.18, you'll need both files.
To view status from utilities.php properly, doing the full 1.2.22 is going to be better, not forgetting to pull lib/boost.php too.
I ended up making a few more changes to lib/boost.php today. All good and tested now. So, if you waited, it was a good idea.
Last commit here: https://github.com/Cacti/cacti/commit/caf0b2d10ab248b9ae2d76ae6e9f11cac1fb102a
You can keep discussing things here, even when the ticket is closed.
To take just these two files in 1.2.18, just copying these two files in between boost run will work? or do we need to take any specific precautions? e.g.: disable remote pollers, and then copy these two files and then enable remote pollers.
Our boot records per half an hour is ~8-10 M, and the poller output is ~ 2-3 M, DS - ~450 K and Graphs - ~250 K
Does this issue require any poller.php files from 1.2.22? Also, we had seen the poller output not empty for the DS of two Devices (in error log), could this issue be because of the devices/DS?
Well, there is more going on here than meets the eye
1) whose version of Weathermap 2) what version of spine 3) how many core and memory on cacti main poller? 4) anything front-ending the database (MaxScale, etc)
whose version of Weathermap 0.98a for Cacti latest versions, Weathermap by Howard Jones what version of spine 1.2.19 how many core and memory on cacti main poller? 36 core, 45 GB mem, 3.5T HD - for db and rrd
anything front-ending the database (MaxScale, etc) No redundancy/scalability for DB, stand alone Maria DB version 10.3.32
@TheWitness @bmfmancini Do you have any other inputs or suggestions (since you asked the above question)
Hi,
We faced the issue again today, we are seeing poller output table not getting empty and boost not running. We are planning for an upgrade, but before that we would like to check if there is anything that needs to be done in server.cnf (Any increase in heap size or innod buffer pool size which is needed to mitigate this issue). At the time of issue, the poller output table are filled with lots of local id's and their data, and these are not getting flushed.
Regards, Sreedha
You should be able to cleanly pull those two files now. Everything has passed QA.
But you may also be having problems with Weathermap, and no one here is really an expert at the moment. We will bring in something more suitable to Cacti by maybe the end of the year. If you know it well though you should be okay.
Will it work if we take both files from 1.2.21? (As we already have lot of issue and are little skeptical as the latest change would not have been used by many (esp. huge deployments), don't want to get stuck in new issues which no one has faced yet)
1.2.22 changes are working fine on a 20k host deployment but back porting while should be ok should be lab tested first
On Thu., Oct. 20, 2022, 7:45 a.m. sreedhasivankutty, < @.***> wrote:
Will it work if we take both files from 1.2.21? (As we already have lot of issue and are little skeptical as the latest change would not have been used by many (esp. huge deployments), don't want to get stuck in new issues which no one has faced yet)
— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4959#issuecomment-1285385899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTAOTIZ3X4CLWM3UPWLWEEWF3ANCNFSM6AAAAAARG7WPEA . You are receiving this because you were mentioned.Message ID: @.***>
The latest lib/boost.php fixes a regression that introduces gaps in graphs and introduces a new sorting algorithm to enhance the old one I would grab the latest one for your testing
On Thu., Oct. 20, 2022, 7:49 a.m. Sean Mancini, @.***> wrote:
1.2.22 changes are working fine on a 20k host deployment but back porting while should be ok should be lab tested first
On Thu., Oct. 20, 2022, 7:45 a.m. sreedhasivankutty, < @.***> wrote:
Will it work if we take both files from 1.2.21? (As we already have lot of issue and are little skeptical as the latest change would not have been used by many (esp. huge deployments), don't want to get stuck in new issues which no one has faced yet)
— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4959#issuecomment-1285385899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTAOTIZ3X4CLWM3UPWLWEEWF3ANCNFSM6AAAAAARG7WPEA . You are receiving this because you were mentioned.Message ID: @.***>
We took both the boost files as suggested and used in 1.2.18 test environment, looks like there are compatibility issues
The graphs were not visible, got this error
and we got few undefined errors,
2022-10-22 21:43:51 - CMDPHP PHP ERROR Backtrace: (CactiShutdownHandler()) 2022-10-22 21:43:51 - ERROR PHP ERROR: Uncaught Error: Call to undefined function cacti_system_zone_set() in /cacti/lib/boost.php:682 Stack trace: #0 /cacti/lib/boost.php(387): boost_process_poller_output('22898', '') #1 /cacti/lib/rrd.php(1257): boost_graph_cache_check(17179, '0', '', Array, false) #2 /cacti/graph_json.php(159): rrdtool_function_graph(17179, '0', Array, '', Array, '1') #3 {main} thrown in file: /cacti/lib/boost.php on line: 682
2022-10-22 21:20:22 - CMDPHP PHP ERROR Backtrace: (CactiShutdownHandler()) 2022-10-22 21:20:22 - ERROR PHP ERROR: Uncaught Error: Call to undefined function boost_debug() in /cacti/poller_boost.php:443 Stack trace: #0 /cacti/poller_boost.php(135): boost_time_to_run(false, 1666434022, '1666433120', '1666434920') #1 {main} thrown in file: /cacti/poller_boost.php on line: 443
Hi,
We are using Cacti 1.2.18 and facing an issue with Poller, Boost stops working randomly or abruptly. We faced similar issue last Thursday and today also. We have ~4K devices and have multi poller structure. Main Poller has about 25 devices only and all three remote poller has ~1000+ devices. Polling is happing in every 5 min and Boost should run in every 30 min or when 8million count is reached.
We see an issue where boost does not run in its stipulated time and thus poller output_boost and pooler_output table gets filled up with millions of data eventually taking up/tmp space and causing data outage. We also observe many poller processes running (all poller processes in 5 mins interval from issue start time (17:25) and subsequent intervals) which does not get exited (via ps-ef commands)
We have observed this warning before the usual poller.php errors. 2022-10-17 17:25:04 - POLLER: Poller[Main Poller] PID[XXXX] WARNING: Poller Output Table not Empty. Issues: 31, DS[DEVICE1 - DS1 – GRAPH1 - , DEVICE2 - Errors/Discards - X/X/X, 10-Gig Ethernet, XXXXXXXXXXXXX-, DEVICE2 - Traffic - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, Device2- Errors/Discards - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, device2 - Traffic - A/B/C, 10-Gig Ethernet, XXXXXXXXXXXXX -, XXXXXXXXXXXXX - Errors/Discards - 5/2/9, 10-Gig Ethernet, XXXXXXXXXXXXX -, DEVICE2 - Traffic - 5/2/9, 10-Gig Ethernet, XXXXXXXXXXXXX -, device2 - Errors/Discards - 5/2/10, 10-Gig Ethernet, XXXXXXXXXXXXX] Graphs[DS2 – GRAPH2, Device - Traffic - Discards – A/B/C , Device2 - Traffic - Discards - x/x/x , DEVICE2 - Traffic - Discards - 5/2/9 , DEVICE2 - Traffic - Discards - 5/2/10 ]
The usual errors that are seen when there is such an issue 2022-10-17 17:25:17 - ERROR PHP WARNING in Plugin 'weathermap': Division by zero in file: cacti/plugins/weathermap/setup.php on line: 728 2022-10-17 17:25:17 - CMDPHP PHP ERROR WARNING Backtrace: (/poller.php[690]:process_poller_output(), /lib/poller.php[544]:api_plugin_hook_function(), /lib/plugins.php[130]:api_plugin_run_plugin_hook_function(), /lib/plugins.php[237]:weathermap_poller_output(), /plugins/weathermap/setup.php[728]:CactiErrorHandler())
2022-10-17 17:25:58 - ERROR PHP WARNING in Plugin 'weathermap': Division by zero in file: cacti/plugins/weathermap/setup.php on line: 728 2022-10-17 17:25:58 - CMDPHP PHP ERROR WARNING Backtrace: (/poller.php[672]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[555]:process_poller_output(), /lib/poller.php[544]:api_plugin_hook_function(), /lib/plugins.php[130]:api_plugin_run_plugin_hook_function(), /lib/plugins.php[237]:weathermap_poller_output(), /plugins/weathermap/setup.php[728]:CactiErrorHandler())
Another stats log (between 17:15 and 18:30 process didn't run) 2022-10-17 18:46:26 - SYSTEM STATS: Time:375.7011 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:41:01 - SYSTEM STATS: Time:351.2834 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:35:43 - SYSTEM STATS: Time:340.8211 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 18:30:32 - SYSTEM STATS: Time:329.0929 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:15:42 - SYSTEM STATS: Time:33.8834 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:11:27 - SYSTEM STATS: Time:78.6570 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:07:02 - SYSTEM STATS: Time:80.1938 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 17:01:29 - SYSTEM STATS: Time:78.8822 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0 2022-10-17 16:56:27 - SYSTEM STATS: Time:77.9787 Method:spine Processes:1 Threads:5 Hosts:25 HostsPerProcess:25 DataSources:26180 RRDsProcessed:0
Monitor stats log (nothing between 17:15 - 18:30, same for SYSTEM THOLD POLLER STATS), otherwise the logs are seen every 5 mins 2022-10-17 18:30:43 - SYSTEM MONITOR STATS: Time:0.1018 Reboots:0 DownDevices:0 Notifications:0 Purges:0 2022-10-17 17:15:48 - SYSTEM MONITOR STATS: Time:0.0336 Reboots:0 DownDevices:0 Notifications:0 Purges:0
After 16:50, Boost ran at 18:38 only (missed conditions '> 'number of records and '>' than 30 mins time)
022-10-17 18:38:18 - SYSTEM BOOST STATS: Time:450.46 ProcessNumber:6 RRDUpdates:1812654 2022-10-17 18:37:32 - SYSTEM BOOST STATS: Time:404.17 ProcessNumber:5 RRDUpdates:1880442 2022-10-17 16:50:46 - SYSTEM BOOST STATS: Time:243.47 RRDUpdates:8267334 2022-10-17 16:50:45 - SYSTEM BOOST STATS: Time:236.56 ProcessNumber:3 RRDUpdates:817806
Please find below the system specifications Cacti 1.2.18 Spine 1.2.19 MariaDB 10.3. OS: Linux Browser: Chrome
Could you please help us to debug/fix this issue as we are seeing this in our live environment quite often.
Many Thanks