Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.63k stars 405 forks source link

Cacti v 1.2.0 issue #2377

Closed fouadmfam closed 5 years ago

fouadmfam commented 5 years ago

Dears Support team. as your usual support. I upgrade cacti from v 0.8.8h to v1.0.0 then to latest 1.2.0 but this upgrade done n new server not the same server with old one. but i faced many problem and very critical 1- system application is more slower than the old one and become more slow when more than one access the system. 2- some graphs not plotting and not drawing in the graph and this issue repeat on more than one device (some graphs draw in the same device draw and other plotting). 3- the boost take a lot of time 26 minutes48 seconds (89.00 percent of update frequency).

I tried to tune the system and DB but i failed in final, please help me to troubleshooting in those issues.

thank you.

netniV commented 5 years ago

Please give the stats details from boost in the cacti log

fouadmfam commented 5 years ago
Current Boost Status
Boost On-demand Updating:   Running
Total Data Sources: 66348
Pending Boost Records:  65371
Archived Boost Records: 291123
Total Boost Records:    356494
===================================================
Boost Storage Statistics
Database Engine:    MEMORY
Current Boost Table(s) Size:    796.56 MBytes
Avg Bytes/Record:   2 KBytes
Max Record Length:  18 Bytes
Max Allowed Boost Table Size:   2.16 GBytes
Estimated Maximum Records:  992.197 K Records
==================================================
Runtime Statistics
Last Start Time:    2019-02-06 13:45:25
Last Run Duration:  26 minutes32 seconds (88.00 percent of update frequency)
RRD Updates:    373420
Peak Poller Memory: 58 MBytes
Detailed Runtime Timers:    RRDUpdates:373420 TotalTime:1592 get_records:2.64 results_cycle:1585.66 rrd_filename_and_template:57.01 rrd_lastupdate:373.15 rrdupdate:1108.76 delete:1.43 timer_overhead:~11
Max Poller Memory Allowed:  1024 MBytes
===================================================
Run Time Configuration
Update Frequency:   30 Minutes
Next Start Time:    2019-02-06 13:41:02
Maximum Records:    1000000 Records
Maximum Allowed Runtime:    20 Minutes
===================================================
Image Caching
Image Caching Status:   Disabled
Cache Directory:    /var/www/html/cacti/cache/boost/
Cached Files:   1903 Files
Cached Files Size:  31.23 MBytes
cigamit commented 5 years ago

First thing, you may want to use 1 hour update frequency. Second, I suspect that your RRDfiles are located on NFS. That choice is death to a system without some enhancements. If it's not on NFS, then if this system is a VM, you better unto that sooner than later. We have several users who have systems much larger that are having no problem.

If you are running on NFS, and there is no way to unto that, you might want to consider the rrdtool python wrapper that allows for some parallelization. We have received a boost parallelization enhancement from @browniebraun, but it's not in place yet.

Tell us more about your system.

fouadmfam commented 5 years ago

Date Wed, 06 Feb 2019 19:32:46 +0200Cacti Version 1.2.0Cacti OS unixRSA Fingerprint 92:fd:c9:19:43:07:5f:3a:b5:ee:dc:f5:d7:3c:8c:ecNET-SNMP Version NET-SNMP version: 5.7.2RRDtool Version Configured 1.4.0+RRDtool Version Found 1.4.8Devices 2442Graphs 33936Data Sources SNMP Get: 3685SNMP Query: 32428Script Query: 91Script Query - Script Server: 13Total: 36217 ====================================================Concurrent Processes 16Max Threads 8PHP Servers 5Script Timeout 30Max OID 10Last Run Statistics Time:35.2537 Method:spine Processes:10 Threads:8 Hosts:2346 HostsPerProcess:235 DataSources:66352 RRDsProcessed:0 =====================================================MemTotal 18.99 GMemFree 8.55 GMemAvailable 14.20 GBuffers 0.00Cached 6.44 GActive 6.41 GInactive 3.30 GSwapTotal 9.77 GSwapFree 9.73 G ======================================================PHP Version 5.4.16PHP Version 5.5.0+ is recommended due to strong password hashing support.PHP OS LinuxPHP uname Linux Network-Cacti 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64PHP SNMP Not Installedmax_execution_time 60memory_limit 3524M =============================================================version 10.1.37-MariaDBcollation_server utf8_general_cicharacter_set_client utf8 max_connections 500max_allowed_packet 33554432max_heap_table_size 2357M tmp_table_size 311Mjoin_buffer_size 622Minnodb_file_per_table ON innodb_buffer_pool_size 6144Minnodb_doublewrite OFFinnodb_lock_wait_timeout 50innodb_flush_log_at_timeout 5innodb_read_io_threads 32 innodb_write_io_threads 16 innodb_buffer_pool_instances 49

On Wed, Feb 6, 2019 at 3:25 PM Jimmy Conner notifications@github.com wrote:

First thing, you may want to use 1 hour update frequency. Second, I suspect that your RRDfiles are located on NFS. That choice is death to a system without some enhancements. If it's not on NFS, then if this system is a VM, you better unto that sooner than later. We have several users who have systems much larger that are having no problem.

If you are running on NFS, and there is no way to unto that, you might want to consider the rrdtool python wrapper that allows for some parallelization. We have received a boost parallelization enhancement from @browniebraun https://github.com/browniebraun, but it's not in place yet.

Tell us more about your system.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/2377#issuecomment-461021762, or mute the thread https://github.com/notifications/unsubscribe-auth/AdkjGhd7QDm4uqiA6BYn16J5UrQflpsdks5vKtfJgaJpZM4alB2_ .

fouadmfam commented 5 years ago

and the system is in vm

fouadmfam commented 5 years ago

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 26 Model name: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz Stepping: 5 CPU MHz: 2926.000 BogoMIPS: 5852.00 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K

fouadmfam commented 5 years ago

Linux 3.10.0-957.1.3.el7.x86_64 (Network-Cacti) 02/06/2019 _x8664 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle 5.61 0.00 2.63 12.80 0.00 78.96

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 139.78 56.12 882.60 30862062 485360684 scd0 0.00 0.00 0.00 1028 0 dm-0 140.64 56.06 882.49 30828169 485303437 dm-1 0.02 0.01 0.08 3536 42180

fouadmfam commented 5 years ago

here the last 3 BOOST status

2019/02/06 19:52:56 - SYSTEM BOOST STATS: Time:1638.0400 RRDUpdates:445710 2019/02/06 19:22:54 - SYSTEM BOOST STATS: Time:1927.2100 RRDUpdates:503182 2019/02/06 19:20:50 - CMDPHP DEBUG: Found lock, so another boost process is running

fouadmfam commented 5 years ago

I changed Boost to one hour but still it takes 26 minuets to complete, also I checked rrd files, on the old cacti version we modified RRA definitions to store very large historical data, each rrd file size is 2.4M, now we are using the default RRA definitions, is it possible that this large rrd files are causing issues with Boost, also how to resize the existing rrd files to use default RRA definitions without losing the data. I also enabled slow query log on database and here is the result:

Query_time: 28.382710 Lock_time: 0.000035 Rows_sent: 9 Rows_examined: 10162518

Rows_affected: 0

use cacti; SET timestamp=1549481585; SELECT host_snmp_cache.field_name FROM (data_template_data,data_local,host_snmp_cache) WHERE data_template_data.local_data_id=data_local.id AND data_local.snmp_query_id=host_snmp_cache.snmp_query_id AND data_template_data.id = '73685' GROUP BY host_snmp_cache.field_name;

Please HELP!

cigamit commented 5 years ago

Run this query and see how long it runs...

explain SELECT DISTINCT hsc.field_name 
FROM data_template_data AS dtd
INNER JOIN data_local AS dl
ON dtd.local_data_id = dl.id
INNER JOIN host_snmp_cache AS hsc
ON dl.host_id = hsc.host_id
AND dl.snmp_query_id = hsc.snmp_query_id
WHERE dtd.id = '73685';

Then remove the "explain" and then get us the timing as well as the explain output.

cigamit commented 5 years ago

I have made a code update relative to your last post. That one query has been removed. It was written horribly. To date, no one had reported it. Thanks.

Now as to your very large RRDfile problem. Yes having 57GB of RRD's is going to take more time. I would invest in flash drives instead of physical drives. That will help a lot. Then once we release the multi-process RRDfile updating, things will get progressively better. We don't generally recommend VM's due to the noisy neighbor issue through.

fouadmfam commented 5 years ago

Many thanks for your interest and i am waiting for the update at the same time i will tuning the system and DB again.

fouadmfam commented 5 years ago

hi cigamit. when i'm tracing in logs i found this "2019/02/09 13:00:11 - SPINE: Poller[Main Poller] ERROR: SQL Failed! Error:'1114', Message:'The table 'poller_output_boost' is full', SQL Fragment:'INSERT INTO poller_output_boost" at this time the graph not draw.

cigamit commented 5 years ago

You need to convert your boost table to InnoDB, and then if you want to continue to use a memory table (very fast), you need to either reduce the column width of the output column, or increase your max_heap_table_size variable to provide more memory for the memory table.

fouadmfam commented 5 years ago

the @@max_heap_table_size = 2471493632, is that good.