craftcms / cms

Build bespoke content experiences with Craft.
https://craftcms.com
Other
3.21k stars 622 forks source link

[4.x]: Colliding database mutex locks between environments #15313

Closed mattgrayisok closed 2 weeks ago

mattgrayisok commented 1 month ago

What happened?

Description

We were seeing Could not acquire a mutex lock for the queue errors on a project occurring randomly, sometimes between jobs executing (when trying to get the next job to run) and sometimes in the middle of jobs executing (updating job progress etc). This led to various problems with jobs getting stuck etc.

A single queue processing daemon for each environment, production & staging, were running at the time.

Whilst trying to identify what was grabbing the queue locks I went down the rabbit hole of trying to determine if different environments could be impacting one another, and I believe they can.

MySQL's get_lock() sets a global lock, irrespective of database selected or connected user (confirmed this with a bit of manual testing).

Yii/Craft does not include the environment name in the mutex lock name for the queue (and likely other stuff but I haven't checked).

This can therefore lead to mutex locks in different logical environments colliding with one another if they are using a shared database instance, as is common outside of larger scale infrastructure setups.

I'm wondering whether including the current environment name in mutex names might be a useful addition to avoid this, or whether changing the APP_ID between environments is a preferred approach (although that feels a bit weird to me).

Steps to reproduce

  1. Run two copies of Craft, sharing a single MySQL DB instance but using different databases to store data.
  2. Mutex collisions will occur between environments, especially on the queue if both environments are busy doing things.

Craft CMS version

4.9.7

PHP version

8.1

Operating system and version

Alpine 3.18

Database type and version

MariaDB 11 but I believe MySQL 8 acts in the same way

Image driver and version

-

Installed plugins and versions

-

brandonkelly commented 1 month ago

Craft sets a keyPrefix on the mutex config, set to the app ID, precisely to avoid this issue:

https://github.com/craftcms/cms/blob/00bf0ea4dfff9ddf812053224244f196804e65b1/src/helpers/App.php#L1012-L1018

App IDs are typically assigned on install, but it’s possible that multiple projects share the same ID if someone is creating multiple projects from the same starter project, with an app ID already defined.

thupsi commented 2 weeks ago

Hello, and sorry to intrude, just saw this by chance. Since @mattgrayisok is talking about instances of the same project here (eg staging and production) isn't it logical @brandonkelly that they share the same app ID?

brandonkelly commented 2 weeks ago

As in two environments?

If two separate Craft installs are sharing the same database instance, they should have separate keyPrefix values, if not separate app IDs.

thupsi commented 2 weeks ago

Exactly, If I understand correctly that's the issue here:

Two environments are using the same database instance and they have the same app ID, which, as you showed, is setting also the keyPrefix. But when we are talking about staging and production, it's kind of expected that they stem from the same install, and the dev must be aware of this issue in order to generate a new app ID.

It's easily solved, but maybe not so clear to everyone? One the other hand, if Craft includes the environment name in the keyPrefix such collisions won't happen, without requiring further action.

brandonkelly commented 2 weeks ago

On the other hand, if Craft includes the environment name in the keyPrefix such collisions won't happen, without requiring further action.

Good point. Just made that change for Craft 4.12 and 5.4.