craftcms / cms

Build bespoke content experiences with Craft.
https://craftcms.com
Other
3.22k stars 626 forks source link

[3.x]: After adding a switch on dev and applying the yaml files site is ruined: all matrix blocks are now doubled #11609

Open bencresty opened 2 years ago

bencresty commented 2 years ago

What happened?

Description

After adding a switch field to one of the matrix blocks of a matrix field that's used for content on each page and applying the yaml files to the staging server the full site is ruined by some 'propagating' action Craft does; all matrix blocks on each page on each site are doubled.

I don't even know how such a thing might be explained, but it's obviously pretty frustrating.

For some reason restoring the database doesn't even work (even when removing all cache and do a garbage collect). And now I don't even know how to restore this and if that's even possible without needing to change all 440 pages by hand :( Which were all containing new data. I have regularly created backups of these, but the backup restore doesn't fix this issue, or fixes it to let Craft destroy it again by this 'propagation...' action?

Steps to reproduce

  1. Add a switch to one of the matrix blocks in a matrix field on local DEV environment
  2. Deploy the code to the staging server, which updates the yaml files
  3. Craft does some magic making all matrix blocks now doubled on each page, on each site (also the many pages which aren't even using this block)

Expected behavior

No change on the existing entries, only adding the new switch to where this block is used and set it to the default value.

Actual behavior

See above

Craft CMS version

3.7.48

PHP version

8.x

Operating system and version

Apache

Database type and version

No response

Image driver and version

No response

Installed plugins and versions

-

brandonkelly commented 2 years ago

For some reason restoring the database doesn't even work (even when removing all cache and do a garbage collect).

Restoring a database backup should absolutely work. In what way is it not? Do you get an error somewhere? Or perhaps Craft is just continuing to apply the same incoming project config changes, which is causing the problem to repeat itself?

Did you happen to see what changes were going to be applied before applying them? My guess is that if you compare the incoming project config YAML with the previously-loaded project config data, they had diverged a bit more than simply adding a single new custom field to a Matrix block type. Most likely the Matrix field’s Propagation Method had changed, which would have triggered the “Applying new propagation method to Matrix blocks” queue job.

ben-gladeye commented 2 years ago

Experiencing this with Matrix, Super Table and Neo fields

Working on replicating - pretty sure happened sometime after 3.7.20

bencresty commented 2 years ago

Restoring a database backup should absolutely work. In what way is it not? Do you get an error somewhere? Or perhaps Craft is just continuing to apply the same incoming project config changes, which is causing the problem to repeat itself?

@brandonkelly

I seriously have no clue. This issue resulted in lots of problems here since friday and two sites being offline. Thank god these weren't production sites, but still I cannot continue my work since and missed an important deadline because of one of these sites.

Normally restoring a backup works just fine, but the whole remote shared hosting is stuck on MySQL now since friday, so the sites frontends and CP are stuck in some process in the back and keep showing spinning wheels. Which is problematic, as it's shared hosting which I can't restart and the hosting company didn't react to my mails so far. For me this makes is practically impossible to debug.

I've got the impression the backup database is fine, but Craft messes it up again and even does something making the mySQL server go into overload and get stuck in both memory and CPU. So because of this Craft cannot even reach the MySQL server anymore, so no way to even test this anymore here.

When looking in the system diagnostics of the hosting the CPU and memory are overloaded, which is most probably the cause Craft cannot reach mySQL animore. Overloaded, even while these two sites are the only ones on that hosting AND cannot be reached from public, only from my IP (as set in .htaccess). So it cannot be caused by a frontend process. I'm the only one being able to reach this hosting and can log into the CMS. The overloading stays even after days of even having the CP or site open. So it's some process in the back that is stuck or something Craft keeps triggering in the background as some endless loop. Either way; this is worrying to me.

On local host, this issue is never seen. On neither of those two sites. But that makes sense as the new switch field was added locally in the CP on both of these sites, so no yaml files needed to be applied.

My guess so far is that one of the last updates of Craft does something like this:

Some extra info:

This is what the log file now says (after all issues) (only location and timezone obfuscated):

[15-Jul-2022 20:59:02 {TIMEZONE}] An Error occurred while handling another error:
PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in {LOCATION}/vendor/yiisoft/yii2/db/Command.php:1302
Stack trace:
#0 {LOCATION}/vendor/yiisoft/yii2/db/Command.php(1302): PDOStatement->execute()
#1 {LOCATION}/vendor/yiisoft/yii2/db/Command.php(1168): yii\db\Command->internalExecute()
#2 {LOCATION}/vendor/yiisoft/yii2/db/Command.php(410): yii\db\Command->queryInternal()
#3 {LOCATION}/vendor/yiisoft/yii2/db/Query.php(249): yii\db\Command->queryAll()
#4 {LOCATION}/vendor/craftcms/cms/src/db/Query.php(152): yii\db\Query->all()
#5 {LOCATION}/vendor/craftcms/cms/src/services/Announcements.php(111): craft\db\Query->all()
#6 {LOCATION}/vendor/craftcms/cms/src/web/assets/cp/CpAsset.php(411): craft\services\Announcements->get()
#7 {LOCATION}/vendor/craftcms/cms/src/web/assets/cp/CpAsset.php(340): craft\web\assets\cp\CpAsset->_announcements()
#8 {LOCATION}/vendor/craftcms/cms/src/web/assets/cp/CpAsset.php(100): craft\web\assets\cp\CpAsset->_craftData()
#9 {LOCATION}/vendor/yiisoft/yii2/web/View.php(274): craft\web\assets\cp\CpAsset->registerAssetFiles()
#10 {LOCATION}/vendor/craftcms/cms/src/web/View.php(1953): yii\web\View->registerAssetFiles()
#11 {LOCATION}/vendor/yiisoft/yii2/web/View.php(168): craft\web\View->registerAssetFiles()
#12 {LOCATION}/vendor/craftcms/cms/src/web/View.php(1161): yii\web\View->endBody()
#13 {LOCATION}/storage/runtime/compiled_templates/94/944bbd7adcd9bf7e497cdbb7075fd8a8.php(97): craft\web\View->endBody()
#14 {LOCATION}/vendor/twig/twig/src/Template.php(405): __TwigTemplate_adc0767554b538fe81c3e8f302a673dc->doDisplay()
#15 {LOCATION}/vendor/twig/twig/src/Template.php(378): Twig\Template->displayWithErrorHandling()
#16 {LOCATION}/storage/runtime/compiled_templates/c6/c601b81868dd58ec634a2d9b9b1d75c7.php(48): Twig\Template->display()
#17 {LOCATION}/vendor/twig/twig/src/Template.php(405): __TwigTemplate_dad00842c6c98136b469aee57fd13c7c->doDisplay()
#18 {LOCATION}/vendor/twig/twig/src/Template.php(378): Twig\Template->displayWithErrorHandling()
#19 {LOCATION}/storage/runtime/compiled_templates/34/3407d5fbc9117ef255156df824810729.php(46): Twig\Template->display()
#20 {LOCATION}/vendor/twig/twig/src/Template.php(405): __TwigTemplate_f89583f380c595f596d464f1bc556e04->doDisplay()
#21 {LOCATION}/vendor/twig/twig/src/Template.php(378): Twig\Template->displayWithErrorHandling()
#22 {LOCATION}/vendor/twig/twig/src/Template.php(390): Twig\Template->display()
#23 {LOCATION}/vendor/twig/twig/src/TemplateWrapper.php(45): Twig\Template->render()
#24 {LOCATION}/vendor/twig/twig/src/Environment.php(318): Twig\TemplateWrapper->render()
#25 {LOCATION}/vendor/craftcms/cms/src/web/View.php(408): Twig\Environment->render()
#26 {LOCATION}/vendor/craftcms/cms/src/web/View.php(461): craft\web\View->renderTemplate()
#27 {LOCATION}/vendor/craftcms/cms/src/web/Controller.php(201): craft\web\View->renderPageTemplate()
#28 {LOCATION}/vendor/craftcms/cms/src/controllers/TemplatesController.php(233): craft\web\Controller->renderTemplate()
#29 [internal function]: craft\controllers\TemplatesController->actionRenderError()
#30 {LOCATION}/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array()
#31 {LOCATION}/vendor/yiisoft/yii2/base/Controller.php(178): yii\base\InlineAction->runWithParams()
#32 {LOCATION}/vendor/yiisoft/yii2/base/Module.php(552): yii\base\Controller->runAction()
#33 {LOCATION}/vendor/craftcms/cms/src/web/Application.php(293): yii\base\Module->runAction()
#34 {LOCATION}/vendor/yiisoft/yii2/web/ErrorHandler.php(109): craft\web\Application->runAction()
#35 {LOCATION}/vendor/craftcms/cms/src/web/ErrorHandler.php(192): yii\web\ErrorHandler->renderException()
#36 {LOCATION}/vendor/yiisoft/yii2/base/ErrorHandler.php(135): craft\web\ErrorHandler->renderException()
#37 {LOCATION}/vendor/craftcms/cms/src/web/ErrorHandler.php(71): yii\base\ErrorHandler->handleException()
#38 [internal function]: craft\web\ErrorHandler->handleException()
#39 {main}

I will now try to restore to an old Craft version. And hopefully the hosting company will react finally to restart the hosting container :(

Most likely the Matrix field’s Propagation Method had changed, which would have triggered the “Applying new propagation method to Matrix blocks” queue job.

Please change it back to make it work fine again ;)

bencresty commented 2 years ago

Experiencing this with Matrix, Super Table and Neo fields

Working on replicating - pretty sure happened sometime after 3.7.20

@ben-gladeye Thanks for your comment. Very happy to see I'm not the only one seeing and reporting this and would become a 'we can't replicate'. Hopefully this will lead to a fix soon!

BTW @brandonkelly I can confirm it happens in multi site setups here too. Both of these sites here are multisite setups, where we use multisite to have different languages on the same domain

bencresty commented 2 years ago

@brandonkelly another piece of information that just came to mind;

Thet database I tried to restore (including the new switch field) to the hosting is working fine when restoring that database to local host in dev environment with the same Craft version (where no yaml files need to be applied after restoring the db!!).

brandonkelly commented 2 years ago

Thet database I tried to restore (including the new switch field) to the hosting is working fine when restoring that database to local host in dev environment with the same Craft version (where no yaml files need to be applied after restoring the db!!).

The database backup already contained the new field? In other words the backup was from after the new field was added via the project config, on production?

Do you have a production backup from before the new field was added?

brandonkelly commented 2 years ago

Experiencing this with Matrix, Super Table and Neo fields

Working on replicating - pretty sure happened sometime after 3.7.20

Excellent, let me know if you can come up with steps to reproduce.

bencresty commented 2 years ago

Do you have a production backup from before the new field was added?

Yes. Also tried that one

bencresty commented 2 years ago

@brandonkelly

BTW, thank god I had already setup the production environment for one of these sites with an earlier version of Craft (but without the switch and perhaps some other late changes)

I just restored the database of right before I added the switch. The database I backupped from the staging. So I overwrited the production database with the most recent staging database from before I added the switch field in the matrix block.

The production site is on another server, but from the same hosting company and with the exact same settings). Also after restoring the database there, from staging (without the switch), it works without issue.

I didn't dare to try to add the switch again (can't afford that right now, especially because another production site is on the same hosting), but I could add new entries and assets without a problem and with that loaded database.

The Craft version there is the same: 3.7.48. So the difference is that no switch field was added / no yaml files were applied / no process was triggered after yaml apply or whatever.

brandonkelly commented 2 years ago

Can you please send the older production DB backup (before the field was added), as well as the composer.json + composer.lock + the config/project/ folder that broke the site, over to support@craftcms.com? Then I can try to reproduce by applying the new YAML changes to it locally.

bencresty commented 2 years ago

@brandonkelly sorry, we never send project-specific files. What I can tell you is that I just did a check in the gitlog and perhaps this helps:

All below is on the DEV server (so not the staging hosting where the problems were seen). When deploying to the staging hosting composer always does a composer install on the hosting with the sent over composer and yaml project files. The database is never copied or changed when deploying, other than by craft itself applying yaml files etc. After each install on the hosting the cache gets cleared and the garbage collector runs just to make sure everything is using the new stuff.

brandonkelly commented 2 years ago

Can you at least share the project config data? That should be safe so long as you don’t have any sensitive data (e.g. S3 access keys) in the project config, and are using environment variables instead.

If so, please send in the following:

You can download your production project config data from UtilitiesProject Config.

proimage commented 2 years ago

I've also seen doubled-up Matrix or Neo blocks here and there lately (in Craft 3.7.x sites), mainly after Project Config changes (i.e. when switching git branches). I can't be much more specific than that, but if I encounter a repeatable scenario, I will let you know. For now, just take this as a +1. :)

brandonkelly commented 2 years ago

There is a scenario where this is expected behavior: when changing a Matrix field’s propagation method to something less restrictive than it used to be (e.g. Only save blocks to the site they were created inSave blocks to all sites the owner element is saved in). In that case, the existing blocks are likely to need to be propagated to additional sites they didn’t exist on before, which could cause those other sites to appear to suddenly have a bunch of duplicated blocks.

bencresty commented 2 years ago

There is a scenario where this is expected behavior: when changing a Matrix field’s propagation method

We didn't change any propagation method. The only thing we did was adding 1 field to a matrix block.

brandonkelly commented 2 years ago

@bencresty I know that’s what you think you did, but the way to be 100% positive that’s the only thing that changed would be to compare the old project config data (exported from UtilitiesProject Config) with the incoming config/project/ folder.

bencresty commented 2 years ago

@brandonkelly yeah, lets's move things around and transpose the responsibility to the client now. Like I didn't check that first in the git commit log... and again and again to be 100% sure... and told you about it.

By mistrusting users/clients issues like this don't get solved. By not respecting that clients can't send project data by company policy neither. It's not our responsibility to debug Craft and not our fault that there's an issue in one of the latest versions. I try to mention issues as they occur so you know and help you by posting as much info as I can without sending project data either way.

This issue is open for 11 days now. Perhaps, instead of moving responsibilities around, it would be an idea to check Craft on changes in the git history of the last versions or whatever that can fix this? I'm obviously not the only one having this nasty issue now. Right now projects are on hold here until it's fixed.

brandonkelly commented 2 years ago

We have not been able to reproduce internally, and there haven’t been any significant changes to Matrix propagation or project config in Craft 3 lately that would have likely introduced a bug like this.

It’s not that I mistrust you; I’m just trying to reproduce, so we can get to the bottom of it.

The issue has only been active for 11 days because you didn’t respond to my previous request 8 days ago.

bencresty commented 2 years ago

The issue has only been active for 11 days because you didn’t respond to my [previous request]

No, it's open because it's not fixed AND you didn't respect the fact that I already answered your question before you asked the same question again. I don't see any point in re-answering the same question just because you won't accept it.

brandonkelly commented 2 years ago

You said you couldn’t send the entire database. I’m trying to find a middle ground, and just asking for the project config data (before and after).

bencresty commented 2 years ago

You said you couldn’t send the entire database.

No, I wrote: "sorry, we never send project-specific files".

brandonkelly commented 2 years ago

If we can’t even get the project config data, please try to reproduce from a fresh Craft installation. If you are able to reproduce, send in the specific steps, and we will look into it from there.

RandomJo commented 2 years ago

We had an issue with repeating Neo fields on two sites recently. We logged an issue with Neo, and they suspect it may be related to this. Unfortunately, we have not been able to replicate so we don't have anything actionable to add outside of an additional report of the issue.

These sites were not multi-site installs, and the problem did not happen on every page. We believe it was isolated to only one or two pages per site, but our clients fixed the issues manually before we were able to investigate fully because in each case it appeared to have happened right after we launched the sites. We launched a third site recently, and we did not notice any duplication issues.

Our account managers also believe this happened once right after we launched our company's website on July 22, 2021. So, it may not be a new bug.

RandomJo commented 2 years ago

We will definitely keep an eye out for this and if we can found a dependable way to replicate, we will chime in with more info! I know it's incredibly hard to fix something you can't see. Thank you for the hard work you put into making your platform usable.

bencresty commented 2 years ago

If we can’t even get the project config data, please try to reproduce from a fresh Craft installation. If you are able to reproduce, send in the specific steps, and we will look into it from there.

Sorry, I tried to help you with as much as I can by supplying al findings and info as I can give you and tried everything here that I could with the setup as far as our code and debugging possibilities go. If you have a concrete setting to check I am happy to take a look for you so you can debug further. But it's very busy here now and debugging Craft itself and do things like building a site from the ground up because you don't even have any clue on where to look to fix an issue in Craft is not something I have the time for and consider our work tbh. For one, because this issue already took a lot of time that couldn't be spent on our work.

Perhaps you could try this yourself with all the info I provided above or you could ask somebody else here in the thread having the same issues.

brandonkelly commented 2 years ago

Perhaps you could try this yourself with all the info I provided above or you could ask somebody else here in the thread having the same issues.

We have, and are unable to reproduce. And while I do understand that multiple people are here seeing similar behavior, there would be an avalanche of people chiming in if this was a common problem when adding new sub-fields to Matrix, something that is done likely hundreds of times a day.

That’s not to say we don’t care about the issue; just that it it’s very likely more nuanced than “add a sub-field to Matrix on a multi-site install and deploy”, which we’ve verified generally works as expected, and it will take additional info if we are going to get to the bottom of it.

bencresty commented 2 years ago

We have, and are unable to reproduce. And while I do understand that multiple people are here seeing similar behavior, there would be an avalanche of people chiming in if this was a common problem when adding new sub-fields to Matrix, something that is done likely hundreds of times a day.

I understand your point about nuance. However, I wonder if it's done 100s of times a day when taking into account this seems to be happening in a multisite setup only (not everybody uses) AND only after applying yaml files and perhaps even only when the propagation method (if that's what it's called?) on the matrix block is set on a particular mode? Don't know about the propagation mode others use here, but it looks like for the other parts all here in this comment section have this same base setup in common. Also I don't believe everybody facing this kind of issues would take action to mention this on github. Especially when there's already an issue, like this, open. Also my experience is that there are a lot of developers who just create a quick workaround (like not applying yaml files, but editing the database by hand in this case for instance) and let the issue be without letting you guys know to push it forward. So there are all kinds of reasons github is not flushed with this being mentioned.

Propagation mode on the Matrix Field here is: image

[edit] Translation method on the Lightswitch field in the matrix block field is: image

If everybody in this thread would tell you what their propagation mode is that might give you another hint on where to look perhaps? Because if it's all using the same propagation mode I think that could be helpful information.

My guess is that the issue started by a change in the code somewhere which handles applying of yaml changes to the database or else one of the things that could be triggered after that and is able to change the database, like functions for propagation. Probably you already have, but just to try to help; if I were you I would try to see what have been changed among that region of code that could cause this by reading back the commit list and check what have been changed in that regard.

So we know a few things:

We also know now that it's probably not started the last version, so perhaps you could check a few versions back in the logs. I'm sure it's in there somewhere, because it was working before on the exact same setup but a different Craft version. The only question is where. Easier said than done, I know.

Hope this helps a little.

RandomJo commented 2 years ago

Just wanted to clarify—although I know this makes the issue harder to find. The sites on which this happened to us were not multi-sites, and we believe it also happened once a year ago so the cause of the problem may have been there for a long time. There could also be more than one cause.

bencresty commented 2 years ago

@RandomJo would be easier if it wasn't unfortunately ;), but that changes the perspective on this issue. Good to know.

There is another thing I just realized that might be important to debug this and could perhaps explain why some of us have this issue and others (or in other situations we) don't:

I changed the Default Value of the switch when adding it to the matrix field block

Both this matrixfield AND this particular matrixfieldblock were already present and in use for a long time on entries and contained data before adding the new lightswitch field to the matrixfieldblock

The matrix field, where we added the switch to in one of its blocks, normally has a state for 'Default value' of 'OFF' (default of Craft) image

But for this switch I changed the state of 'Default Value' to 'ON', because it should be 'ON' on each use per default. (while adding the switch) image

Both this matrix field, as well as this particular block the switch was added to, were already in use in entries before and already there was existing content for these entries, matrix field and this particular block, before adding this new Lightswitch field to this matrixfieldblock.

This, in my understanding and expectation, would trigger a change to all existing entries using this Matrixfield and this particular matrixblock to a) add the switch value and b) set this value to 'ON'.

I don't know the complete inner workings of Craft, but I can imagine not only adding a lightswitchfield to an existing matrixfieldblock already in use and containing data would trigger changes that don't happen in all scenario's and therefore could perhaps explain the reason some of us have this issue and some of us (or we, but in different situations) don't. The same with having a different default value changed vs keeping the original default value. Plus the fact that it's a lightswitch AND the fact that we're applying yaml files to a database that starts all this updating...

Guess this is the reason we learn never to have duplicate data. Having data in both yaml files as well as in the database is great while it works, for versioning and all, but when things like this happen it's getting difficult to debug pretty quickly as it seems as there's another system doing things to apply yaml files to an already existing database. Not shure what's the best solution to this, as I like the system with project yaml files too, but just wanted to put that out there to think about for later maybe.

If the system was working with only a database (nothing like yaml to be applied) than we would've spotted this issue while developing in our dev environment and would be absolutely sure it would work on another environment too, because all data would be statically copied to the other environment by just copying the database. That has down sides perhaps, but wouldn't make things this complicated when things go wrong. But most of all; I'm pretty sure this issue wouldn't have happened whithout yaml files system. For one; because everybody would always use the same workflow making not only debugging way easier and less systems active that could cause issues like this, but also, we would spot issues way sooner, as there are way less parameters to mix and match while debugging.

The fact that we don't even know how to replicate this after days and multiple people having his issue says a lot IMO. Again; just wanted to put that out here as we now bumped into that and I think is worth to at least rethink if that idea is still the best and not causing more complexity than it solves. Which I honestly don't know the answer to myself yet. Although I start leaning, because of this experience, towards yes.

brandonkelly commented 2 years ago

This, in my understanding and expectation, would trigger a change to all existing entries using this Matrixfield and this particular matrixblock to a) add the switch value and b) set this value to 'ON'.

When a new Lightswitch field is added, existing blocks’ Lightswitch values in the DB will just be set to null (like any other field type), and Lightswitch fields will treat the null values as if they were set to the default value. The new field won’t directly trigger a resave for existing blocks.

So again, something else must have changed to trigger the resaves. The only way to know for sure is by comparing the production project config to the deployed project config. They don’t contain any content, just your project structure, so I’m a little confused on why sharing them is such a concern. The main thing to watch out for is that they don’t contain any sensitive security keys, etc., which should all be defined with environment variables regardless, so they’re not being shared in your repo.

bencresty commented 1 year ago

And yet again bumping into this issue. Still not fixed in Craft 3.7.57.

What I did is extremely simple:

on local development environment:

During deployment to server Craft applies the new yaml files to the server WRONG and/or fires 'applying new propagation' wrongly!! It doesn't matter how we apply the changes from yaml files. It's faulty if we hit 'changes only' and it's faulty when we hit 'Reapply everything'. The result is the same and as months before: After applying Craft starts 'Applying new propagation' which causes again to duplicate EACH AND EVERY block in EACH matrix field on ALL entries using this matrix field.

After restoring the old database everything is fine again WITH THE OLD VERSION. But clearly this doesn't fix the issue and we are now stuck again as we want to have the new matrix block on the server. And that just isn't possible with yaml files.

This whole yaml-system just cannot be trusted. Please fix this and make it stable, or get rid of this double data source system as it's way too sensitive and extremely prone to errors.

:(

brandonkelly commented 1 year ago

@bencresty Once again asking you to please work with us on this. We have tried numerous things and yet still unable to reproduce. It’s impossible for us to know where to look, or whether a potential fix actually works, if we can’t reproduce the bug in the first place.

If you have a database backup from before the most recent deployment, please share it with us, along with whatever changes (project config or otherwise) were deployed that caused the duplications. Happy to sign an NDA or whatever is needed to make that happen.