codership / galera

Synchronous multi-master replication library
GNU General Public License v2.0
447 stars 177 forks source link

Slow queries are not evenly distributed along the cluster members after Linux update #658

Open ukyanj0 opened 4 months ago

ukyanj0 commented 4 months ago

Hello our team has recently updated our WAS server Linux version from CentOS 7 to Rocky Linux 8 and faced a new warning that might be generated between WAS proxySQL and DB server MariaDBs(in a Galera cluster). From the MariaDB slow query log below, it seems like to be the user defined variable "@" that is the main cause of the problem that we will change the codes as soon as possible.

However the problem is, if I can ask here, 1) why were the MariaDB slow queries only generated on DB 4,5 in the cluster whereas it was barely generated on DB 1,2,3? For example, DB 4,5 keep continuously generating the slow query log for our entire service time, from 9am to 20pm. 2) from the picture below, DB 4,5 uses up to few million K swap memories, while 0 swap memory is used for DB 1,2,3 swap memory

The only possibilities I can come up with are 1) proxySQL don't parse the @ queries successfully, then it puts the @ queries off to DBs coming as late as possible. 2) DB 4,5 is 32 core Dell server(8 core HP server for DB 1,2,3). proxySQL catches that DB 4,5 have better specification then it pushes heavy @ queries into better servers 3) Could you tell me what could be some possible Galera cluster features related to that? 4) Just in case, wondering if set_query_lock_on_hostgroup = 1 variable can affect it even tho there is no SET in the query... image

Thank you and sorry for the mistakes in English in advance.

WAS server(updated) ProxySQL version of WAS 2.3.2-10 MariaDB version of DB 10.11 Rocky Linux 8

DB server(not updated) MariaDB version of DB 10.0.38 CentOS

proxySQL error log -> " 2024-04-22 11:39:52 7f53f0df7700 InnoDB: Warning: difficult to find free blocks in InnoDB: the buffer pool (338 search iterations)! InnoDB: 0 failed attempts to flush a page! Consider InnoDB: increasing the buffer pool size. InnoDB: It is also possible that in your Unix version InnoDB: fsync is very slow, or completely frozen inside InnoDB: the OS kernel. Then upgrading to a newer version InnoDB: of your operating system may help. Look at the InnoDB: number of fsyncs in diagnostic info below. InnoDB: Pending flushes (fsync) log: 0; buffer pool: 0 InnoDB: 3043627 OS file reads, 760460 OS file writes, 169905 OS fsyncs InnoDB: Starting InnoDB Monitor to print further InnoDB: diagnostics to the standard output.

2024-04-22 17:59:19 7f98b03908c0 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html. ''

MariaDB slow query log -> " Time: 240422 9:21:11 User@Host: vegas[vegas] @ [192.168.0.67] Thread_id: 33768 Schema: h00317 QC_hit: No Query_time: 18.048330 Lock_time: 0.001120 Rows_sent: 30 Rows_examined: 133935 use h00317; SET timestamp=1713745271; SELECT FROM ( SELECT , @rownum:=@rownum+1 RANK FROM ( SELECT , (SUM(ORDERAMT)+SUM(INTAMT)) TOTAL, SUM(ORDERAMT) ORDERTOTAL, SUM(INTAMT) INTTOTAL FROM ( SELECT P., ORDERAMT, INTAMT, (SELECT COUNT(A.SCHEDULEID) FROM TCUSTOMERSCHEDULE A WHERE A.CUSTOMERID = P.CUSTOMERID AND A.SCHEDULESTATUS > 1 AND A.SCHEDULEDATE BETWEEN '20240122' AND '20240422') AS VISITCOUNT ... "