Closed vishnubraj closed 6 years ago
You need to be very careful with the number of threads and processes. You setting was over 200 connections, and as high as 300. MySQL does not behave very well with that many connections. This is more of a support issue than a bug. You should verify that your max_connections is around 1000 if you are going to poll with 10 * 20 threads. Not recommended actually.
With Cacti 1.2, you will be able to have multiple data collectors to spread out that load. I would definitely consider more data collectors to reduce the load on the central MySQL server once Cacti 1.2 is released.
with same number of process and threads i have another instance of cacti 0.86 which is working better and faster(same hardware). before upgrading to Cacti 1.2.0 it was working without any CPU issue on Cacti 1.1.38 . Cacti 0.86 is more faster than 1.1.38, Not sure how to proceed with this.
adding to it.. Below are the htop outputs..
SORT Based on the CPU utilization
SORT based on the Memory utilization
can someone please help me with this? i am not able to downgrade it also to 1.1.38.. the install option is grayed out when i try to downgrade. Please help.
Post your MySQL cnf file so we can see what settings you have used.
Also, try reducing the numbers as suggested, see if it makes a difference. Are there any errors appearing in your cacti/sql log files?
This almost sounds like some from of locking issue or too many connections.
You might also want to try MySQLTuner to see if you should be adjusting your values.
yes, i tried.. even my server is having only 1 process and 1 thread from last 4 days. and it went out of memory again today...
[client-server]
[client]
default-character-set=utf8mb4
port=3306
socket=/data/mysql/mysql.sock
[mysql]
default-character-set=utf8mb4
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
#sql-mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
secure-auth=on
#old_passwords=1
max_connections=2000
sql-mode=NO_ENGINE_SUBSTITUTION
collation-server=utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
max_heap_table_size=2G
tmp_table_size=2G
join_buffer_size=256M
innodb_buffer_pool_size=6G
innodb_buffer_pool_instances=32
innodb_doublewrite=OFF
innodb_flush_log_at_timeout=3
innodb_read_io_threads=32
innodb_write_io_threads=32
innodb_fast_shutdown=0
innodb_log_file_size=5M
innodb_data_file_path = ibdata1:10M:autoextend
#custom_config
query_cache_type = 1
query_cache_limit = 256K
query_cache_min_res_unit = 2k
query_cache_size = 512M
slow-query-log = 1
slow-query-log-file = /data/mysql/mysql-slow.log
long_query_time = 1
max_allowed_packet = 67108864
skip-name-resolve
#
# include all files from the config directory
#
[mysqld_safe]
log-error=/var/log/mysqld.log
!includedir /etc/my.cnf.d
ERROR MESSAGE IN SERVER CONSOLE.
Server Memory Usage:
23rd Oct it was upgraded to 1.2.0 i.e week 43
I see the below cacti log at 8:40 GMT today..
2018/11/13 08:48:05 - POLLER: Poller[1] WARNING: Poller Output Table not Empty. Issues: 92, DS[33210, 33211, 33212, 33213, 33214, 33214, 33214, 33214, 33214, 33214, 33215, 33215, 33215, 33215, 33215, 33215, 33216, 33216, 33217, 33217], Additional Issues Remain. Only showing first 20
2018/11/13 08:48:05 - SYSTEM WARNING: Primary Admin account notifications disabled! Unable to send administrative Email.
2018/11/13 08:48:05 - POLLER: Poller[1] WARNING: There are '1' detected as overrunning a polling cycle, please investigate
2018/11/13 08:48:05 - SYSTEM WARNING: Primary Admin account notifications disabled! Unable to send administrative Email.
2018/11/13 08:48:05 - SYSTEM WARNING: Primary Admin account notifications disabled! Unable to send administrative Email.
2018/11/13 08:48:05 - SYSTEM WARNING: Primary Admin account notifications disabled! Unable to send administrative Email.
2018/11/13 08:48:05 - POLLER: Poller[1] WARNING: Cron is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '60' second Cron, but 650.2 seconds have passed since the last poll!
2018/11/13 08:48:05 - POLLER: Poller[1] WARNING: Cron is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '60' second Cron, but 650.2 seconds have passed since the last poll!
2018/11/13 08:48:05 - POLLER: Poller[1] WARNING: Cron is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '60' second Cron, but 650.3 seconds have passed since the last poll!
2018/11/13 08:48:05 - SPINE: Poller[1] ERROR: Spine Timed Out While Waiting for Threads to End
2018/11/13 08:48:05 - SPINE: Poller[1] ERROR: Spine Timed Out While Processing Devices Internal
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:04 - SPINE: Poller[1] ERROR: SS[0] Script Server did not start properly return message was: 'U'
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:48:02 - DSSTATS WARNING: File '/var/www/cacti/rra/415/33639.rrd' Does not exist
2018/11/13 08:47:45 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:47:16 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:46:49 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:46:21 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:45:49 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:45:22 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:44:53 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:44:24 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:43:58 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:43:31 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:43:03 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:42:37 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:42:10 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:41:42 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:41:14 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:40:47 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
2018/11/13 08:40:20 - SPINE: Poller[1] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
I dont have any mysql related logs..
I used mysql tuner when i was using cacti1.1.38 and modified the my.cnf file based on that..
mysql tuner output...
>> MySQLTuner 1.7.13 - Major Hayden <major@mhtx.net>
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
[--] Skipped version check for MySQLTuner script
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:
[OK] Currently running supported MySQL version 10.2.16-MariaDB-log
[OK] Operating on 64-bit architecture
-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /data/mysql/cacti1a.ams2.err(1B)
[OK] Log file /data/mysql/cacti1a.ams2.err exists
[OK] Log file /data/mysql/cacti1a.ams2.err is readable.
[OK] Log file /data/mysql/cacti1a.ams2.err is not empty
[OK] Log file /data/mysql/cacti1a.ams2.err is smaller than 32 Mb
[OK] /data/mysql/cacti1a.ams2.err doesn't contain any warning.
[OK] /data/mysql/cacti1a.ams2.err doesn't contain any error.
[--] 0 start(s) detected in /data/mysql/cacti1a.ams2.err
[--] 0 shutdown(s) detected in /data/mysql/cacti1a.ams2.err
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +Aria +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in InnoDB tables: 590.7M (Tables: 139)
[--] Data in MEMORY tables: 156.3M (Tables: 1)
[OK] Total fragmented tables: 0
-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] User 'cacti@%' does not specify hostname restrictions.
[!!] There is no basic password file list!
-------- CVE Security Recommendations --------------------------------------------------------------
[--] Skipped due to --cvefile option undefined
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 32m 40s (5M q [2K qps], 12K conn, TX: 2G, RX: 1G)
[--] Reads / Writes: 96% / 4%
[--] Binary logging is disabled
[--] Physical Memory : 23.5G
[--] Max MySQL memory : 516.0G
[--] Other process memory: 21.1G
[--] Total buffers: 10.8G global + 258.7M per thread (2000 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[!!] Maximum reached memory usage: 29.7G (126.37% of installed RAM)
[!!] Maximum possible memory usage: 516.0G (2194.50% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[OK] Slow queries: 0% (69/5M)
[OK] Highest usage of available connections: 3% (75/2000)
[OK] Aborted connections: 0.02% (3/12952)
[!!] Query cache may be disabled by default due to mutex contention.
[OK] Query cache efficiency: 38.8% (3M cached / 8M selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (33 temp sorts / 26K sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 3% (1K on disk / 34K total)
[OK] Thread cache hit rate: 99% (75 created / 12K connections)
[OK] Table cache hit rate: 60% (157 open / 261 opened)
[OK] Open file limit used: 0% (32/16K)
[OK] Table locks acquired immediately: 99% (17K immediate / 17K locks)
-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 12 thread(s).
[--] Using default value is good enough for your version (10.2.16-MariaDB-log)
-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.2% (24M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/2.4M
[!!] Read Key buffer hit rate: 50.0% (8 cached / 4 reads)
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 8.0G/590.7M
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (0.1220703125 %): 5.0M * 2/8.0G should be equal 25%
[!!] InnoDB buffer pool instances: 32
[--] Number of InnoDB Buffer Pool Chunk : 64 for 32 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 100.00% (289755880 hits/ 289763709 total)
[OK] InnoDB Write log efficiency: 97.85% (5254903 hits/ 5370626 total)
[OK] InnoDB log waits: 0.00% (0 waits / 115723 writes)
-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/1B
[!!] Aria pagecache hit rate: 91.0% (11K cached / 1K reads)
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[--] This is a standalone server
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Restrict Host for user@% to user@SpecificDNSorIp
MySQL was started within the last 24 hours - recommendations may be inaccurate
Reduce your overall MySQL memory footprint for system stability
Dedicate this server to your database for highest performance.
Performance schema should be activated for better diagnostics
Consider installing Sys schema from https://github.com/mysql/mysql-sys
Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: http://bit.ly/2wgkDvS
Variables to adjust:
*** MySQL's maximum memory usage is dangerously high ***
*** Add RAM before increasing MySQL buffer variables ***
query_cache_size (=0)
query_cache_type (=0)
performance_schema = ON enable PFS
innodb_log_file_size should be (=1G) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
innodb_buffer_pool_instances(=8)
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]# free -m
total used free shared buff/cache available
Mem: 24075 23652 201 3 221 83
Swap: 16383 8489 7894
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]# ps -ef | grep php
apache 2986 1 57 09:46 ? 00:03:13 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 4208 1 77 09:34 ? 00:13:38 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 5579 1 81 09:24 ? 00:22:38 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 7604 1 71 09:47 ? 00:03:13 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 8841 1 62 09:48 ? 00:02:14 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 11232 1 78 09:26 ? 00:20:02 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 12625 1 74 09:37 ? 00:10:49 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 13922 13912 0 09:50 ? 00:00:00 /bin/sh -c php /var/www/cacti/poller.php > /dev/null 2>&1
apache 13930 13922 8 09:50 ? 00:00:08 php /var/www/cacti/poller.php
apache 15949 1 74 09:39 ? 00:09:24 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 16664 16646 0 09:51 ? 00:00:00 /bin/sh -c php /var/www/cacti/poller.php > /dev/null 2>&1
apache 16667 16664 3 09:51 ? 00:00:01 php /var/www/cacti/poller.php
apache 16708 1 5 09:51 ? 00:00:01 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 17281 17014 0 09:51 ? 00:00:00 /usr/bin/php -q /var/www/cacti/script_server.php spine 1
apache 17474 17014 0 09:51 ? 00:00:00 /usr/bin/php -q /var/www/cacti/script_server.php spine 1
apache 18957 1 0 09:51 ? 00:00:00 /usr/bin/php -q /var/www/cacti/plugins/monitor/poller_monitor.php
root 19051 17046 0 09:51 pts/0 00:00:00 grep php
apache 19736 1 78 09:29 ? 00:17:48 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 21460 1 70 09:41 ? 00:07:32 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 25911 1 74 09:31 ? 00:15:24 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 29854 1 71 09:44 ? 00:05:23 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 31310 1 79 09:21 ? 00:24:12 /usr/bin/php -q /var/www/cacti/poller_boost.php
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]# free -m
total used free shared buff/cache available
Mem: 24075 2116 21229 3 730 21580
Swap: 16383 716 15667
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]#
[FREE::root@cacti1a.ams2 ~]# ps -ef | grep php
apache 27474 27458 0 09:55 ? 00:00:00 /bin/sh -c php /var/www/cacti/poller.php > /dev/null 2>&1
apache 27481 27474 11 09:55 ? 00:00:03 php /var/www/cacti/poller.php
apache 27507 27487 0 09:55 ? 00:00:00 /usr/bin/php -q /var/www/cacti/script_server.php spine 1
apache 27511 27487 0 09:55 ? 00:00:00 /usr/bin/php -q /var/www/cacti/script_server.php spine 1
root 30025 17046 0 09:55 pts/0 00:00:00 grep php
[FREE::root@cacti1a.ams2 ~]#
Are you using spine? Have you updated it if so?
yes i am using spine and its 1.1.38 version.. there is no 1.2.0 version so i am keep using the 1.1.38 version
It's VERY important that you use the same level or there will be issues.
You have to download and compile it yourself as we don't provide OS specific releases, just sources.
See Testing Envionrment: Spine for more information on how to compile it.
i upgraded spine to latest version.. I see the memory issue get resolved.. But the CPU issue is not resolved its always high due to poller_boost.php process ..
top - 12:05:14 up 3:06, 2 users, load average: 72.87, 61.29, 37.98
Tasks: 249 total, 19 running, 229 sleeping, 0 stopped, 1 zombie
%Cpu0 : 93.0 us, 6.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu1 : 91.4 us, 8.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu2 : 85.4 us, 14.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 95.0 us, 5.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 77.7 us, 16.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 5.7 si, 0.0 st
%Cpu5 : 93.4 us, 6.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu6 : 91.4 us, 7.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu7 : 90.0 us, 9.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu8 : 88.0 us, 12.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 95.3 us, 4.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 94.0 us, 5.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu11 : 93.7 us, 6.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem : 24653708 total, 201784 free, 23942612 used, 509312 buff/cache
KiB Swap: 16777212 total, 2538680 free, 14238532 used. 312544 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1817 mysql 20 0 16.296g 1.658g 2004 S 158.1 7.1 81:33.81 mysqld
15580 apache 20 0 448824 27248 552 R 67.8 0.1 13:10.09 php
2836 apache 20 0 442680 51860 532 R 61.5 0.2 5:16.48 php
23573 apache 20 0 446776 54532 532 R 59.8 0.2 12:04.14 php
22295 apache 20 0 430392 53084 8380 R 59.1 0.2 0:22.94 php
10910 apache 20 0 440632 53508 536 R 53.2 0.2 5:00.26 php
13357 apache 20 0 463160 30760 532 R 52.8 0.1 20:53.68 php
6016 apache 20 0 467256 37596 532 R 52.5 0.2 22:29.67 php
6634 apache 20 0 452920 54204 556 R 52.5 0.2 15:45.64 php
26926 apache 20 0 446776 56848 532 R 52.2 0.2 11:37.28 php
8911 apache 20 0 463160 38748 532 R 51.8 0.2 23:28.83 php
24412 apache 20 0 457016 32780 528 R 51.8 0.1 19:28.45 php
32430 apache 20 0 471352 36532 536 R 48.2 0.1 25:09.49 php
21400 apache 20 0 459064 34828 528 R 44.5 0.1 18:31.89 php
31926 apache 20 0 457016 31128 536 R 41.5 0.1 16:20.83 php
11585 apache 20 0 2711016 1.743g 836 S 21.6 7.4 2:38.38 rrdtool
20551 apache 20 0 2233448 1.557g 836 S 13.6 6.6 2:03.32 rrdtool
[FREE::root@cacti1a.ams2 ~]# ps -ef | grep php
kill -9 apache 1208 1201 0 12:06 ? 00:00:00 /bin/sh -c php /var/www/cacti/poller.php > /dev/null 2>&1
apache 1212 1208 0 12:06 ? 00:00:00 php /var/www/cacti/poller.php
apache 1221 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 1222 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 1223 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 1227 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_dsstats.php
apache 1228 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_dsstats.php
apache 1233 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_dsstats.php
apache 1234 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_reports.php
apache 1237 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_reports.php
apache 1238 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_reports.php
apache 1240 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_spikekill.php
apache 1242 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_spikekill.php
apache 1245 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_spikekill.php
apache 1246 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_automation.php -M
apache 1248 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_automation.php -M
apache 1251 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_maintenance.php
apache 1252 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_automation.php -M
apache 1253 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_maintenance.php
apache 1255 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/poller_maintenance.php
apache 1258 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/plugins/monitor/poller_monitor.php
apache 1259 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/plugins/monitor/poller_monitor.php
apache 1261 1 0 12:06 ? 00:00:00 /usr/bin/php -q /var/www/cacti/plugins/monitor/poller_monitor.php
root 1275 16189 0 12:07 pts/0 00:00:00 grep php
apache 2836 1 46 11:52 ? 00:06:52 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 6016 1 65 11:31 ? 00:23:48 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 6634 1 67 11:42 ? 00:16:56 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 8911 1 70 11:32 ? 00:24:55 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 10910 1 54 11:55 ? 00:06:25 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 13357 1 64 11:33 ? 00:22:10 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 15580 1 65 11:45 ? 00:14:25 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 21400 1 63 11:36 ? 00:19:51 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 22294 1 1 12:00 ? 00:00:04 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 22295 1 29 12:00 ? 00:02:01 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 23573 1 70 11:48 ? 00:13:37 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 24412 1 68 11:37 ? 00:20:42 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 26926 1 72 11:49 ? 00:13:13 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 31926 1 62 11:39 ? 00:17:26 /usr/bin/php -q /var/www/cacti/poller_boost.php
apache 32430 1 68 11:29 ? 00:26:22 /usr/bin/php -q /var/www/cacti/poller_boost.php
[FREE::root@cacti1a.ams2 ~]#
after killing the poller_boost.php process CPU came down, but it will go high again in few hours...
top - 12:11:15 up 3:12, 2 users, load average: 4.99, 30.63, 32.53
Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie
%Cpu0 : 14.3 us, 1.7 sy, 0.0 ni, 78.4 id, 5.6 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 13.9 us, 1.7 sy, 0.0 ni, 71.9 id, 12.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu2 : 12.3 us, 2.0 sy, 0.0 ni, 75.0 id, 10.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3 : 9.4 us, 1.3 sy, 0.0 ni, 77.4 id, 11.4 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu4 : 19.5 us, 3.0 sy, 0.0 ni, 47.8 id, 27.9 wa, 0.0 hi, 1.7 si, 0.0 st
%Cpu5 : 15.9 us, 2.0 sy, 0.0 ni, 73.8 id, 8.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu6 : 9.0 us, 0.7 sy, 0.0 ni, 89.0 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 9.3 us, 1.0 sy, 0.0 ni, 79.1 id, 10.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8 : 13.7 us, 1.0 sy, 0.0 ni, 80.0 id, 5.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 13.7 us, 0.3 sy, 0.0 ni, 85.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 8.4 us, 0.3 sy, 0.0 ni, 91.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 4.7 us, 0.7 sy, 0.0 ni, 84.7 id, 10.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 24653708 total, 20829436 free, 2531504 used, 1292768 buff/cache
KiB Swap: 16777212 total, 15267148 free, 1510064 used. 21716568 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1817 mysql 20 0 16.289g 1.698g 1660 S 57.8 7.2 83:32.44 mysqld
15676 apache 20 0 1565328 14808 3228 S 39.5 0.1 0:04.23 spine
15513 apache 20 0 409908 35204 10812 S 9.6 0.1 0:02.10 php
15402 apache 20 0 231496 3384 2524 D 4.7 0.0 0:05.43 rrdtool
15388 apache 20 0 426296 51040 10796 S 3.0 0.2 0:03.72 php
602 root 20 0 0 0 0 S 0.7 0.0 0:24.40 jbd2/sdb1-8
655 root 20 0 6472 0 0 S 0.3 0.0 0:08.00 rngd
12363 root 20 0 157724 2316 1544 R 0.3 0.0 0:00.24 top
15780 apache 20 0 1611956 15396 3296 S 0.3 0.1 0:01.20 spine
1 root 20 0 52256 2420 1396 S 0.0 0.0 0:04.42 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.34 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0
Have you adjusted for the various recommendations from tuner?
Also, can you post the MySQL technical support tab that Cacti shows in the utilities page?
I am looking at the tuner recommendations now.. will update you.. Below is the screen shot from technical support page..
the tuner recommendations are updated.. Below is the latest screen shot from cacti technical support page..
I still see the unfinished poller_boost.php process which is not releasing the CPU..
[FREE::root@cacti1a.ams2 ~]# ps -eo comm,pid,etimes,cmd | grep poller_boost.php
php 5010 385 /usr/bin/php -q /var/www/cacti/poller_boost.php
php 9764 267 /usr/bin/php -q /var/www/cacti/poller_boost.php
php 15303 148 /usr/bin/php -q /var/www/cacti/poller_boost.php
php 20877 28 /usr/bin/php -q /var/www/cacti/poller_boost.php
grep 23642 0 grep poller_boost.php
[FREE::root@cacti1a.ams2 ~]#
In that case, it may be worth enabling Metadata Lock Performance Instrumentation to see if there are any locks holding things up:
https://dev.mysql.com/doc/refman/8.0/en/metadata-locks-table.html
Also, are you seeing any of these warnings: WARNING: RRD On Demand Updater Exceeded Runtime Limits. Continuing to Process!
Also, what are you setting for boost_rrd_update_interval
in the settings table?
Do you have any of the boost stats from the logs to show how long it's running on average?
i just enabled metadata lock performance Instrumentation, i dont see any output now.
MariaDB [cacti]> USE INFORMATION_SCHEMA;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [INFORMATION_SCHEMA]> SELECT * FROM INNODB_LOCK_WAITS;
Empty set (0.00 sec)
MariaDB [INFORMATION_SCHEMA]>
MariaDB [INFORMATION_SCHEMA]>
MariaDB [INFORMATION_SCHEMA]> SELECT *
-> FROM INNODB_LOCKS
-> WHERE LOCK_TRX_ID IN (SELECT BLOCKING_TRX_ID FROM INNODB_LOCK_WAITS);
Empty set (0.00 sec)
MariaDB [INFORMATION_SCHEMA]> SELECT INNODB_LOCKS.*
-> FROM INNODB_LOCKS
-> JOIN INNODB_LOCK_WAITS
-> ON (INNODB_LOCKS.LOCK_TRX_ID = INNODB_LOCK_WAITS.BLOCKING_TRX_ID);
Empty set (0.00 sec)
MariaDB [INFORMATION_SCHEMA]> SELECT TRX_ID, TRX_REQUESTED_LOCK_ID, TRX_MYSQL_THREAD_ID, TRX_QUERY
-> FROM INNODB_TRX
-> WHERE TRX_STATE = 'LOCK WAIT';
Empty set (0.00 sec)
MariaDB [INFORMATION_SCHEMA]>
I am not seeing any "WARNING: RRD On Demand Updater Exceeded Runtime Limits" messages
RRD UPdate interval is 1Hour
BOOST Stats
2018/11/13 12:50:24 - SYSTEM BOOST STATS: Time:18.5100 RRDUpdates:50000
2018/11/13 12:52:44 - SYSTEM BOOST STATS: Time:39.7000 RRDUpdates:203150
2018/11/13 12:54:10 - SYSTEM BOOST STATS: Time:8.3600 RRDUpdates:107209
2018/11/13 12:56:09 - SYSTEM BOOST STATS: Time:8.3700 RRDUpdates:107368
2018/11/13 12:58:10 - SYSTEM BOOST STATS: Time:8.9000 RRDUpdates:107502
2018/11/13 13:00:11 - SYSTEM BOOST STATS: Time:9.5500 RRDUpdates:107252
2018/11/13 13:02:13 - SYSTEM BOOST STATS: Time:11.7300 RRDUpdates:107640
2018/11/13 13:04:11 - SYSTEM BOOST STATS: Time:9.3600 RRDUpdates:107696
2018/11/13 13:06:11 - SYSTEM BOOST STATS: Time:9.7700 RRDUpdates:107662
2018/11/13 13:08:11 - SYSTEM BOOST STATS: Time:10.5600 RRDUpdates:107610
2018/11/13 13:10:12 - SYSTEM BOOST STATS: Time:11.7400 RRDUpdates:107596
2018/11/13 13:13:14 - SYSTEM BOOST STATS: Time:13.3500 RRDUpdates:161083
2018/11/13 13:15:31 - SYSTEM BOOST STATS: Time:28.4900 RRDUpdates:107672
2018/11/13 13:17:57 - SYSTEM BOOST STATS: Time:44.6000 RRDUpdates:145987
2018/11/13 13:20:36 - SYSTEM BOOST STATS: Time:30.6000 RRDUpdates:122209
2018/11/13 13:22:55 - SYSTEM BOOST STATS: Time:52.2000 RRDUpdates:107838
2018/11/13 13:24:28 - SYSTEM BOOST STATS: Time:20.1500 RRDUpdates:138236
2018/11/13 13:28:01 - SYSTEM BOOST STATS: Time:49.3700 RRDUpdates:148124
2018/11/13 13:30:55 - SYSTEM BOOST STATS: Time:44.3900 RRDUpdates:159930
2018/11/13 13:34:24 - SYSTEM BOOST STATS: Time:65.3100 RRDUpdates:150120
2018/11/13 13:37:17 - SYSTEM BOOST STATS: Time:65.2300 RRDUpdates:157569
2018/11/13 13:38:48 - SYSTEM BOOST STATS: Time:42.4600 RRDUpdates:107840
2018/11/13 13:41:17 - SYSTEM BOOST STATS: Time:59.7800 RRDUpdates:141898
2018/11/13 13:43:18 - SYSTEM BOOST STATS: Time:50.0900 RRDUpdates:127526
2018/11/13 13:46:06 - SYSTEM BOOST STATS: Time:51.1900 RRDUpdates:148444
2018/11/13 13:48:54 - SYSTEM BOOST STATS: Time:38.0500 RRDUpdates:161373
2018/11/13 13:52:35 - SYSTEM BOOST STATS: Time:63.7200 RRDUpdates:164810
2018/11/13 13:55:18 - SYSTEM BOOST STATS: Time:54.6700 RRDUpdates:118542
2018/11/13 13:56:33 - SYSTEM BOOST STATS: Time:51.7500 RRDUpdates:94746
2018/11/13 14:07:12 - SYSTEM BOOST STATS: Time:223.7300 RRDUpdates:150636
2018/11/13 14:07:52 - SYSTEM BOOST STATS: Time:13.4200 RRDUpdates:50000
Your boost stats do not suggest one hour, but rather random intervals. This would suggest either the interval is not what you believe it should be, or something is triggering the boost run. Since the screenshots back up your one hour statement, can you double check the setting in the settings table for me? I think they will correspond to each other but i'd rather be sure.
select * from settings where name like 'boost_%'
If you download the latest poller_boost.php I've added an extra log message. This will show whether it's being forced for some reason. Failing that, the above list of settings will hopefully give me a pointer.
here is output..
MariaDB [cacti]> select * from settings where name like 'boost_%';
+-----------------------------------------+-----------------------------------------+
| name | value |
+-----------------------------------------+-----------------------------------------+
| boost_last_run_time | 2018-11-13 14:57:17 |
| boost_max_output_length | 1532419590:16 |
| boost_next_run_time | 2018-11-13 15:26:02 |
| boost_parallel | 1 |
| boost_peak_memory | 33718712 |
| boost_png_cache_directory | /var/www/cacti/cache/boost/ |
| boost_png_cache_enable | on |
| boost_poller_mem_limit | 1024 |
| boost_poller_status | complete - end time:2018-11-13 14:58:17 |
| boost_redirect | on |
| boost_rrd_update_enable | on |
| boost_rrd_update_interval | 60 |
| boost_rrd_update_max_records | 100000 |
| boost_rrd_update_max_records_per_select | 50000 |
| boost_rrd_update_max_runtime | 1200 |
| boost_rrd_update_string_length | 2000 |
| boost_rrd_update_system_enable | on |
+-----------------------------------------+-----------------------------------------+
17 rows in set (0.00 sec)
i just updated the poller_boost.php file.. i will update here as soon as i see any output..
also i see the below lock in the mysql SHOW ENGINE INNODB STATUS
also i am not able to disable on-Demand RRD updating config.. it gets enabled automatically.
Yes, RRD on demand will always be enabled for multiple pollers. Can you tell me how many data sources you have?
i have around 54k Data sources..
2018/11/14 12:11:00 - SYSTEM STATS: Time:58.9669 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54185 RRDsProcessed:0
--
2018/11/14 12:10:00 - SYSTEM STATS: Time:59.0099 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54191 RRDsProcessed:0
2018/11/14 12:09:00 - SYSTEM STATS: Time:59.3029 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54190 RRDsProcessed:0
2018/11/14 12:08:00 - SYSTEM STATS: Time:59.0923 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54199 RRDsProcessed:0
2018/11/14 12:07:00 - SYSTEM STATS: Time:59.5549 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54189 RRDsProcessed:0
2018/11/14 12:06:00 - SYSTEM STATS: Time:59.3808 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54185 RRDsProcessed:0
2018/11/14 12:05:00 - SYSTEM STATS: Time:59.3571 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54191 RRDsProcessed:0
2018/11/14 12:04:00 - SYSTEM STATS: Time:58.2520 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54190 RRDsProcessed:0
2018/11/14 12:03:00 - SYSTEM STATS: Time:58.7466 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54199 RRDsProcessed:0
2018/11/14 12:02:00 - SYSTEM STATS: Time:58.6976 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54189 RRDsProcessed:0
2018/11/14 12:01:00 - SYSTEM STATS: Time:58.9698 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54185 RRDsProcessed:0
2018/11/14 12:00:00 - SYSTEM STATS: Time:57.8632 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54191 RRDsProcessed:0
2018/11/14 11:59:00 - SYSTEM STATS: Time:57.9896 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54190 RRDsProcessed:0
2018/11/14 11:58:00 - SYSTEM STATS: Time:58.7867 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54199 RRDsProcessed:0
2018/11/14 11:57:00 - SYSTEM STATS: Time:58.6846 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54189 RRDsProcessed:0
2018/11/14 11:56:00 - SYSTEM STATS: Time:58.2684 Method:spine Processes:10 Threads:10 Hosts:395 HostsPerProcess:40 DataSources:54185 RRDsProcessed:0
I think you need to update the option Maximum Records. Currently, you have that set to 100000
and the default is 1000000
which is 10x as much. Effectively, what is happening is that every polling cycle, you are likely gettting up to 54k updates into the poller_output_boost table. So, it only takes two cycles before it is thinking that it needs to update the system.
Thanks @netniV ..... i updated the maximum records option to 1000000 and observing the performance.. will update you..
So it seems to be running better now?
Yes its fine now... Thanks @netniV :)
Please refer this https://github.com/Cacti/cacti/issues/2146 when i increase the process and threads the CPU becomes very high and server goes down. It was working fine with 1.1.38 version. Please help me.