codership / galera-manager-support

Galera Manager Support Repository
8 stars 2 forks source link

Node runs into "too many open files" and restarts mysql service (same reason as issue #66) #67

Open dbucher-datasport opened 1 year ago

dbucher-datasport commented 1 year ago

Running into a unfortunate restart of mysql all the time

Steps to reproduce:

  1. Create new cluster on galera manager (Ubuntu 20.04 with mysql 8.0.34-26.15)
  2. Deploy 1-n nodes on Ubuntu 20.04
  3. Create Database and have it do some operations (the more operations you have the faster it happens)
  4. Get the pid of mysql (eg. ps -ef | grep 'mysql')
  5. Use lsof to check open files (sudo lsof -p 511030)

Issue mysqld keeps the file wsrep_status.json open even after its deleted:

mysqld  511030 mysql 4917u      REG              253,0       1408  4727578 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4918u      REG              253,0       1408  4727579 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4919u      REG              253,0       1408  4727580 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4920u      REG              253,0       1408  4727581 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4921u      REG              253,0       1408  4727582 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4922u      REG              253,0       1408  4727583 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4923u      REG              253,0       1408  4727584 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4924u      REG              253,0       1408  4727585 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4925u      REG              253,0       1408  4727586 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4926u      REG              253,0       1408  4727587 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4927u      REG              253,0       1408  4727561 /var/lib/mysql/wsrep_status.json (deleted)
mysqld  511030 mysql 4928u      REG              253,0       1408  4727589 /var/lib/mysql/wsrep_status.json

This will stack up until the server reaches the default open_files_limit = 10000, then it will restart. Files will be released and the server runs another day or so until it runs into the same issuee.

I tracked down some information about this, Here my assumtions:

wsrep_status_file was introduced in 8.0.26-26.8

Setting a own value like wsrep_status_file='' in the custom config on galera manager does not change the variable

Galera cluster seems to set it in /etc/mysql/wsrep/conf.d/99.galera.cnf by default to: loose_wsrep_status_file = "wsrep_status.json"

If I remove it from there and restart the mysql service directly over the server and not over the galera manager it does not produce the wsrep_status.json and like that does not run into the issue. But once i restart mysql over galera manager the 99.galera.cnf is reset and the issue apears again.

ayurchen commented 1 year ago

Hi, workaround for this file descriptor leak in MySQL-wsrep patch released with GM 1.8.0