Closed jenlampton closed 4 years ago
I found this on stack overflow:
Sometimes the process that has locked the table shows up as sleeping in the processlist! I was tearing my hair out until I killed all the sleeping threads that were open in the database in question (none were active at the time). That finally unlocked the table and let the update query run.
Nate killed a hanging process, but the next time we tried to save we got the 504 Gateway Time-out
error again. The save after that one returned SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
again.
max_connections = 100
connect_timeout = 5
wait_timeout = 600
max_allowed_packet = 32M
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 32M
max_heap_table_size = 32M
query_cache_limit = 128K
query_cache_size = 64M
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 10
innodb_buffer_pool_size = 256M
innodb_log_buffer_size = 8M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 400
innodb_flush_method = O_DIRECT
innodb_file_format = Barracuda
innodb_file_per_table = on
innodb_large_prefix = 1
innodb_file_format_max = Barracuda
@jenlampton, Let's see how these work out.
@quicksketch said a lot of those were not in place when he logged in, so he set most of them up. He says he thinks mysql has been using the default settings (perhaps ever since the server move?)
Did you find his changes from today? he added innodb_flush_method = O_DIRECT
for example. And he had innodb_buffer_pool_size = 256M
up at 2G
I just tried to save the node again, still getting 504 Gateway Time-out
Hey @larsdesigns, I set some updated MySQL variables in /etc/mysql/mariadb.conf.d/50-server.cnf
, as it seemed like we were running the mostly default configuration. Things did seem to run a bit better once I set the following:
innodb_buffer_pool_size = 2000M
innodb_log_file_size = 250M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
I put mysql tuner (https://github.com/major/MySQLTuner-perl) in the root home folder to run an analysis of our general configuration.
Other things we tried included restarting the entire server. And backing-up, dropping, and reimporting the entire database for backdropcmsorg
. The problem of not being able to update the status of that node persists.
To test, you can try going https://backdropcms.org/node/2797/edit, setting the status to published, then trying to save. On the server, you can then run SHOW FULL PROCESSLIST
and you'll see that the database is stuck while trying to UPDATE
the node table.
@quicksketch I have an idea. Why don't we turn off php, use a mysql query to update the node status then turn php back on?
@larsdesigns mysql won't start anymore, I'm getting the error mysql: unknown variable 'max_connections=100'
Since the config file /etc/mysql/mariadb.conf.d/50-server.cnf
still shows that value at 80, I'll need to know where you put your config?
@larsdesigns I found your duplicate config file at /etc/mysql/conf.d/mysql.cnf
and I was able to remove all the undefined variables from it. The only two variables that remain in there now are connect_timeout
and max_allowed_packet
. All the others prevented from mysql from running on the command line.
After cleaning up that file I was able to disable PHP (thanks to the hint by @quicsketch), use mysql on the command line to manually alter the node
and node_revision
table so status
is now 1
, and restart PHP.
I got the node published.
I think I found the source of the problem, and why it affected Security Announcements but not other types of content.
Our custom module borg_mailers_security
has this code in hook_node_update()
:
function borg_mailers_security_node_update($node) {
_borg_mailers_security_worker($node);
}
That function mails all subscribers when a security announcement is published. It seems as though separately, the SMTP account BackdropCMS.org was using from MailJet was apparently terminated, so BackdropCMS.org was unable to send email. When the site tried to send the security email, it couldn't. Because the emails were sent in the middle of the node saving process, the node wouldn't ever completely save because the page would time out.
For the time being, I've disabled SMTP module (again) since our site can send email fine without it now that we have re-setup DKIM, reverse-proxy, SPF, DMARC, etc.
I'm going to work up some small code changes to our custom module to prevent it from completing saving the node. Ideally we'd actually switch this to either a queue process, or stop having BackdropCMS.org send the email directly and instead use a 3rd party mailing list.
Can we go ahead and close this one?
As long as we have another issue for I'm going to work up some small code changes to our custom module
open already. If we close this without that, we'll never remember to make those changes.
I filed PR at https://github.com/backdrop-ops/backdropcms.org/pull/672 that should fix this problem by switching to sending emails during cron jobs using a queue.
@serundeputy Reviewed the PR and gave it the thumbs up. Thanks Geoff!
Today when we went to edit and then save the SA for today's security release, the browser hung for about 60 seconds, then the server threw a
SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded;
error.We tried again on the same node, and it happened again. We tried to publish the node using Bulk Operations on the content list, and that also took 60 seconds, then failed.
Other nodes on the site seem to save okay, but this one, for some reason, does not.
We then restarted MySQL on the server.
After that, We were able to create a duplicate node with the content we needed. It saved initially, but, unpublished. We edited it, set status to publish and tried to save. It would not save a second time.
The server now hung for 60 seconds, then threw a
504 Gateway Time-out
error. The second time we tried to save, it, we got the sameSQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded;
error.This new node would also not be published via Bulk Operations on the content list.