Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.65k stars 406 forks source link

Cacti Mysql Queries are slow #1831

Closed vishnubraj closed 6 years ago

vishnubraj commented 6 years ago

Cacti Version: 1.1.38 Spine Version: 1.1.38 PHP Version: 7.1.14 Mysql Version: 10.2.16-MariaDB Number of hosts: 373 Number of Data source: 20386

Below is my my.cnf file

[client-server]

[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
#sql-mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
secure-auth=on
#old_passwords=1
max_connections=500
sql-mode=NO_ENGINE_SUBSTITUTION
collation-server=utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
max_heap_table_size=2G
tmp_table_size=2G
join_buffer_size=256M
innodb_buffer_pool_size=6G
innodb_buffer_pool_instances=32
innodb_doublewrite=OFF
innodb_flush_log_at_timeout=3
innodb_read_io_threads=32
innodb_write_io_threads=32
innodb_fast_shutdown=0
innodb_log_file_size=5M
innodb_data_file_path = ibdata1:10M:autoextend
#custom_config
query_cache_type = 1
query_cache_limit = 256K
query_cache_min_res_unit = 2k
query_cache_size = 512M
slow-query-log = 1
slow-query-log-file = /var/lib/mysql/mysql-slow.log
long_query_time = 1
max_allowed_packet = 67108864
skip-name-resolve
#
# include all files from the config directory
#
[mysqld_safe]
log-error=/var/log/mysqld.log 

Cacti is very slow when we make any changes in the GUI or browsing the graphs.. i have a 12 core processor with 3.50GHz speed.
When i enabled the mysql sloq query log(its attached), i get lot of logs. Can someone help me with this. mysql_slow_query_log.txt

netniV commented 6 years ago

Check the Utilities -> System Utilities -> Technical Support -> Summary (tab) and scroll down to the bottom where the MySQL tuning recommendations are. Do all the left hand values exceed the right?

vishnubraj commented 6 years ago

Yes, all the values exceeds the recommended value..

Variable Current value Recommended Value
version 10.2.16-MariaDB-log >= 5.6
collation_server utf8mb4_unicode_ci utf8mb4_unicode_ci
character_set_client utf8mb4 utf8mb4
max_connections 500 >= 100
max_heap_table_size 2048M >=1176M
max_allowed_packet 67108864 >= 16777216
tmp_table_size 2048M >= 64M
join_buffer_size 256M >= 64M
innodb_file_per_table ON ON
innodb_buffer_pool_size 8192M >=5878M
innodb_doublewrite OFF OFF
innodb_lock_wait_timeout 50 >= 50
innodb_flush_log_at_timeout 3 >= 3
innodb_read_io_threads 32 >= 32
innodb_write_io_threads 32 >= 16
netniV commented 6 years ago

Then this will either be a disk IO or CPU usage problem. If the CPU usage in top looks normal, it suggests it's purely down to the speed of read/writing from the hard drives to me.

Others, like @cigamit and @jhonnyx82 maybe have more insight as they have come across larger installations before now.

cigamit commented 6 years ago

Ya, my first guess would be Disk I/O. Normally I would also think it could be that the MySQL server is separate from the Cacti server and has a bit of latency in between, but they appear to be on the same server. It would be help to see his system stats from the log also.

g1augusto commented 6 years ago

I had a problem with Disk I/O for the rrd files, MySQL always ran smooth, fixed it with dedicated SSD storage

cigamit commented 6 years ago

Search on "SYSTEM STATS:" and post the top 30 results from your Cacti log, then also post the output of 'free -g', my guess as @jhonnyx82 suggests is not enough memory for the number of datasources. The other solution is to goto Console > Settings > Performance and enable on demand updating.

vishnubraj commented 6 years ago

hi @cigamit Please find the below output..

2018/07/17 00:30:59 - SYSTEM STATS: Time:45.2208 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:30:49 - SYSTEM STATS: Time:98.4006 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:28:17 - SYSTEM STATS: Time:14.8632 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:27:16 - SYSTEM STATS: Time:14.8062 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:26:16 - SYSTEM STATS: Time:14.8888 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:25:16 - SYSTEM STATS: Time:14.8264 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:24:15 - SYSTEM STATS: Time:13.9624 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:23:15 - SYSTEM STATS: Time:13.8801 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:22:17 - SYSTEM STATS: Time:15.0667 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:21:16 - SYSTEM STATS: Time:15.0020 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:20:16 - SYSTEM STATS: Time:14.9877 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:19:15 - SYSTEM STATS: Time:14.0204 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:18:16 - SYSTEM STATS: Time:14.7021 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:17:16 - SYSTEM STATS: Time:14.8035 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:16:16 - SYSTEM STATS: Time:14.8367 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:15:16 - SYSTEM STATS: Time:14.8251 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:14:16 - SYSTEM STATS: Time:14.8294 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:13:15 - SYSTEM STATS: Time:13.7998 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:12:17 - SYSTEM STATS: Time:15.3310 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:11:16 - SYSTEM STATS: Time:14.9507 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:10:43 - SYSTEM STATS: Time:37.1559 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:10:07 - SYSTEM STATS: Time:54.7628 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:09:05 - SYSTEM STATS: Time:58.1734 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:07:15 - SYSTEM STATS: Time:13.7874 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:06:17 - SYSTEM STATS: Time:15.0668 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:05:15 - SYSTEM STATS: Time:13.8646 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:04:16 - SYSTEM STATS: Time:15.0310 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:03:17 - SYSTEM STATS: Time:14.9998 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:02:16 - SYSTEM STATS: Time:15.0138 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:01:18 - SYSTEM STATS: Time:16.6330 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
              total        used        free      shared  buff/cache   available
Mem:             23           4           4           1          14          17
Swap:            15           0          15
[FREE::root@cacti1a.ams2 ~]#

On demand updating is already enabled. I am checking for any hardware related issues also.

netniV commented 6 years ago
2018/07/17 00:11:16 - SYSTEM STATS: Time:14.9507 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:10:43 - SYSTEM STATS: Time:37.1559 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:10:07 - SYSTEM STATS: Time:54.7628 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:09:05 - SYSTEM STATS: Time:58.1734 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:07:15 - SYSTEM STATS: Time:13.7874 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2

The middles lines seem interesting, as something caused them to take quite a bit longer than normal. Like 4x as long.

cigamit commented 6 years ago

This issue here is that you have likely 500 threads hitting the database. You need to reduce the concurrent processes. How many cores does this system have? With 500 threads, the level of context switching has to be through the roof. If you are going to have that many threads, or near that many, if you want the Web to be performing, you should split the database and the web and make sure both have 20+ cores each. Honestly, you should not need more than say 4 concurrent processes.

vishnubraj commented 6 years ago

Its a 12 CPU and each CPU has 6 cores 2 threads per socket.. so its 72 Core and 144 Thread it can handle...

[FREE:vishnu:admin3@cacti1a.ams2 ~]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
Stepping:              4
CPU MHz:               3599.941
CPU max MHz:           3900.0000
CPU min MHz:           1200.0000
BogoMIPS:              6999.57
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11

So i need to calculate the threads used by cacti = Spine process(12) * Threads(40) = 480 correct?

netniV commented 6 years ago

Yes

g1augusto commented 6 years ago

@vishnubraj I see you have 1 minute polling, did you move from 5 minutes or was it native at 5 minutes?

vishnubraj commented 6 years ago

@jhonnyx82 3 months before I moved it from 5 min to 1 min..

cigamit commented 6 years ago

No, you are reading the data wrong. The cpu's reporting in the /proc/cpuinfo file are the actual cores, not the sockets. So, what you have is a dual socket, 6 cores per socket, 2 threads per core. So, what that means is that you have a total of 24 threads and you are killing your database.

Regardless, MySQL does not scale very well when there are literally hundreds of threads all querying the same table.

Turn down the heat and things will run better. I've had some thoughts about partitioning the poller_items table into multiple tables by site or some other category, but it's just a thought right now. That would definitely help from a scalability perspective.

cigamit commented 6 years ago

Last comment was for @vishnubraj.

cigamit commented 6 years ago

If you had a P9 PPC64 processor, you would have a dual socket system, with 6 to 12 cores per socket and upto 8 threads per core. This translates at peak to a system with 192 threads. So, in that case, you would have a monster. Way faster than x86 from raw memory I/O bandwidth too. Perfect scale out Cacti system actually.

cigamit commented 6 years ago

I'm going to go ahead and close this. Look to the forums for more detail from the overall community.

vishnubraj commented 6 years ago

@cigamit @netniV @jhonnyx82 Thanks everyone.. looks like, Its mostly due to the hardware.. i will check by upgrading the hardware..

cigamit commented 6 years ago

Once you update to 1.2, if you turn on the slow query log in MySQL/MariaDB and find some slow queries, simply open a ticket. I found one yesterday that was brutal slow for template development. Was shocked, but after doing some optimization it went from 17 seconds to 0.17 seconds. These things happen from time to time.

vishnubraj commented 6 years ago

ok Will do..