Closed vishnubraj closed 6 years ago
Check the Utilities -> System Utilities -> Technical Support -> Summary (tab) and scroll down to the bottom where the MySQL tuning recommendations are. Do all the left hand values exceed the right?
Yes, all the values exceeds the recommended value..
Variable | Current value | Recommended Value |
---|---|---|
version | 10.2.16-MariaDB-log | >= 5.6 |
collation_server | utf8mb4_unicode_ci | utf8mb4_unicode_ci |
character_set_client | utf8mb4 | utf8mb4 |
max_connections | 500 | >= 100 |
max_heap_table_size | 2048M | >=1176M |
max_allowed_packet | 67108864 | >= 16777216 |
tmp_table_size | 2048M | >= 64M |
join_buffer_size | 256M | >= 64M |
innodb_file_per_table | ON | ON |
innodb_buffer_pool_size | 8192M | >=5878M |
innodb_doublewrite | OFF | OFF |
innodb_lock_wait_timeout | 50 | >= 50 |
innodb_flush_log_at_timeout | 3 | >= 3 |
innodb_read_io_threads | 32 | >= 32 |
innodb_write_io_threads | 32 | >= 16 |
Then this will either be a disk IO or CPU usage problem. If the CPU usage in top looks normal, it suggests it's purely down to the speed of read/writing from the hard drives to me.
Others, like @cigamit and @jhonnyx82 maybe have more insight as they have come across larger installations before now.
Ya, my first guess would be Disk I/O. Normally I would also think it could be that the MySQL server is separate from the Cacti server and has a bit of latency in between, but they appear to be on the same server. It would be help to see his system stats from the log also.
I had a problem with Disk I/O for the rrd files, MySQL always ran smooth, fixed it with dedicated SSD storage
Search on "SYSTEM STATS:" and post the top 30 results from your Cacti log, then also post the output of 'free -g', my guess as @jhonnyx82 suggests is not enough memory for the number of datasources. The other solution is to goto Console > Settings > Performance and enable on demand updating.
hi @cigamit Please find the below output..
2018/07/17 00:30:59 - SYSTEM STATS: Time:45.2208 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:30:49 - SYSTEM STATS: Time:98.4006 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:28:17 - SYSTEM STATS: Time:14.8632 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:27:16 - SYSTEM STATS: Time:14.8062 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:26:16 - SYSTEM STATS: Time:14.8888 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:25:16 - SYSTEM STATS: Time:14.8264 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:24:15 - SYSTEM STATS: Time:13.9624 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:23:15 - SYSTEM STATS: Time:13.8801 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:22:17 - SYSTEM STATS: Time:15.0667 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:21:16 - SYSTEM STATS: Time:15.0020 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:20:16 - SYSTEM STATS: Time:14.9877 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:19:15 - SYSTEM STATS: Time:14.0204 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:18:16 - SYSTEM STATS: Time:14.7021 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:17:16 - SYSTEM STATS: Time:14.8035 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:16:16 - SYSTEM STATS: Time:14.8367 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:15:16 - SYSTEM STATS: Time:14.8251 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:14:16 - SYSTEM STATS: Time:14.8294 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:13:15 - SYSTEM STATS: Time:13.7998 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:12:17 - SYSTEM STATS: Time:15.3310 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:11:16 - SYSTEM STATS: Time:14.9507 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:10:43 - SYSTEM STATS: Time:37.1559 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:10:07 - SYSTEM STATS: Time:54.7628 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:09:05 - SYSTEM STATS: Time:58.1734 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:07:15 - SYSTEM STATS: Time:13.7874 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:06:17 - SYSTEM STATS: Time:15.0668 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:05:15 - SYSTEM STATS: Time:13.8646 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:04:16 - SYSTEM STATS: Time:15.0310 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:03:17 - SYSTEM STATS: Time:14.9998 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:02:16 - SYSTEM STATS: Time:15.0138 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2018/07/17 00:01:18 - SYSTEM STATS: Time:16.6330 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
total used free shared buff/cache available
Mem: 23 4 4 1 14 17
Swap: 15 0 15
[FREE::root@cacti1a.ams2 ~]#
On demand updating is already enabled. I am checking for any hardware related issues also.
2018/07/17 00:11:16 - SYSTEM STATS: Time:14.9507 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49037 RRDsProcessed:0
2018/07/17 00:10:43 - SYSTEM STATS: Time:37.1559 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49048 RRDsProcessed:0
2018/07/17 00:10:07 - SYSTEM STATS: Time:54.7628 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49049 RRDsProcessed:0
2018/07/17 00:09:05 - SYSTEM STATS: Time:58.1734 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49051 RRDsProcessed:0
2018/07/17 00:07:15 - SYSTEM STATS: Time:13.7874 Method:spine Processes:12 Threads:40 Hosts:330 HostsPerProcess:28 DataSources:49043 RRDsProcessed:0
2
The middles lines seem interesting, as something caused them to take quite a bit longer than normal. Like 4x as long.
This issue here is that you have likely 500 threads hitting the database. You need to reduce the concurrent processes. How many cores does this system have? With 500 threads, the level of context switching has to be through the roof. If you are going to have that many threads, or near that many, if you want the Web to be performing, you should split the database and the web and make sure both have 20+ cores each. Honestly, you should not need more than say 4 concurrent processes.
Its a 12 CPU and each CPU has 6 cores 2 threads per socket.. so its 72 Core and 144 Thread it can handle...
[FREE:vishnu:admin3@cacti1a.ams2 ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
Stepping: 4
CPU MHz: 3599.941
CPU max MHz: 3900.0000
CPU min MHz: 1200.0000
BogoMIPS: 6999.57
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-11
So i need to calculate the threads used by cacti = Spine process(12) * Threads(40) = 480 correct?
Yes
@vishnubraj I see you have 1 minute polling, did you move from 5 minutes or was it native at 5 minutes?
@jhonnyx82 3 months before I moved it from 5 min to 1 min..
No, you are reading the data wrong. The cpu's reporting in the /proc/cpuinfo file are the actual cores, not the sockets. So, what you have is a dual socket, 6 cores per socket, 2 threads per core. So, what that means is that you have a total of 24 threads and you are killing your database.
Regardless, MySQL does not scale very well when there are literally hundreds of threads all querying the same table.
Turn down the heat and things will run better. I've had some thoughts about partitioning the poller_items table into multiple tables by site or some other category, but it's just a thought right now. That would definitely help from a scalability perspective.
Last comment was for @vishnubraj.
If you had a P9 PPC64 processor, you would have a dual socket system, with 6 to 12 cores per socket and upto 8 threads per core. This translates at peak to a system with 192 threads. So, in that case, you would have a monster. Way faster than x86 from raw memory I/O bandwidth too. Perfect scale out Cacti system actually.
I'm going to go ahead and close this. Look to the forums for more detail from the overall community.
@cigamit @netniV @jhonnyx82 Thanks everyone.. looks like, Its mostly due to the hardware.. i will check by upgrading the hardware..
Once you update to 1.2, if you turn on the slow query log in MySQL/MariaDB and find some slow queries, simply open a ticket. I found one yesterday that was brutal slow for template development. Was shocked, but after doing some optimization it went from 17 seconds to 0.17 seconds. These things happen from time to time.
ok Will do..
Cacti Version: 1.1.38 Spine Version: 1.1.38 PHP Version: 7.1.14 Mysql Version: 10.2.16-MariaDB Number of hosts: 373 Number of Data source: 20386
Below is my my.cnf file
Cacti is very slow when we make any changes in the GUI or browsing the graphs.. i have a 12 core processor with 3.50GHz speed.
When i enabled the mysql sloq query log(its attached), i get lot of logs. Can someone help me with this. mysql_slow_query_log.txt