backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Server 500 error on image resource for Promoted Card in view. #5631

Open stpaultim opened 2 years ago

stpaultim commented 2 years ago

Description of the bug

Occasionally after installing a new site the front page does not render properly. See:

image

Clearing the cache resolves the problem AND I'm quite sure that it doesn't happen every time.

NOTE: This occurred when spinning up the site in Lando. Immediately afterwards, I destroyed the site and rebuilt it and it loaded as expected No problem.

Steps To Reproduce

I can not provide reliable steps to reproduce.

Proposed Solutions

Do we need to add another clearing of the cache to the install process. It seems like maybe something is not loading fast enough, under some uncertain conditions.

WILD SPECULATION: The fact that the third image is not displaying properly suggests to me that maybe the page is rendering before that image is ready to be rendered.

Additional Information

I believe that I have only seen this when spinning up a site with Lando. Could that be a factor?

In Zulip @argiepiano mentioned having seen this problem before as well. But, if this is not a regular problem or even a frequent problem, I'm not sure if we should worry about it.

indigoxela commented 2 years ago

The problem might be limited to Lando. Didn't see it in a Tugboat demo I just spun up, and also didn't see that on local installs (I'm not using Lando).

Let's wait a bit to see if someone confirms, but I'm confident everything's OK.

argiepiano commented 2 years ago

I just saw this in a non-lando deploy (I use MAMP and create my dev installations manually). In this case it was a view (preview). It did not show one of the images and got a 500 error (see image). Upon reloading the image shows up. Very hard to reproduce.

I saw the same thing yesterday on the front page in the same installation.

Screen Shot 2022-05-19 at 7 03 35 AM
argiepiano commented 2 years ago

This has been extremely difficult to reproduce. The last time I saw this error was:

  1. Clean install of Backdrop using MAMP
  2. Created a new view (page) of nodes (cards) that shows a list of rendered nodes
  3. Visited that page, and one of the cards showed a broken image icon and the console showed the 500. The rest of the page loaded fine.

Upon reload of the page the error was gone and did not come back.

This happened on Chrome, MacOS Monterey. In total I've seen this 3 or 4 times in the last week.

ghost commented 2 years ago

I've seen issues previously with images that use an image style but when the image style hasn't generated the image yet. There's a function you can call to generate the image style, can't recall what it is ATM though. Perhaps that needs to be added to the profile...

ghost commented 2 years ago

I think its: image_style_create_derivative()

argiepiano commented 2 years ago

There's a function you can call to generate the image style

That sounds very promising. I don't know much about image styles. I did check the files folder and noticed that files/styles/large/public/field/image was missing one of the cards image files (I think the one that threw the error, but not sure). When I revisited the view, the missing card file was created automatically. So it may be possible that, under some circumstances, the file creation is only completed after the rendering, and somehow the server returns a 500 for that image that hasn't been fully moved to that folder.

indigoxela commented 2 years ago

Now I saw the problem, too!

In Dblog right after install:

Error: Failed to create style directory: public://styles/card/public/field/image
Notice: Unable to generate the derived image located at public://styles/card/public/field/image/card2-card_1.png

Reloading the home page seems to create everything missing and the problem is gone.

But looking at the images filenames card2-card_1.png - note the "1" proves that something didn't work smoothly. It took several attempts to create images. Might be a race condition when creating the file directory for card images.

argiepiano commented 2 years ago

Yay... I guess. Was there a line number and function in the error?

indigoxela commented 2 years ago

Was there a line number and function in the error?

No line numbers or function names. Just two entries in dblog.

But if the creation of the file directory fails, it probably has something to do with standard_install() around line 421 (file_prepare_directory()).

klonos commented 2 years ago

I confirm that I see that happening for me too, although for me it's the opposite: the first 2 images/card tiles are not being rendered properly, whereas the last one is:

image

I can not provide reliable steps to reproduce. ... In Zulip @argiepiano mentioned having seen this problem before as well. But, if this is not a regular problem or even a frequent problem, I'm not sure if we should worry about it.

That has been consistent for me (both the problem itself, as well as the fact that only the last card tile is rendered properly). I have seen it enough times for me to raise #5742 (which I now have closed as a duplicate - thanks @olafgrabienski for pointing that out).

I believe that I have only seen this when spinning up a site with Lando. Could that be a factor?

Not sure if a factor, but I am also seeing this when using Lando.

Clearing the cache resolves the problem...

Upon reload of the page the error was gone...

Confirming that for me, all it takes is a page refresh to fix the issue.

klonos commented 2 years ago

...in case that matters, in my Lando configurations I'm using php 7.4 (7.4.29 to be exact) and apache 2.4.53, with MariaDB 5.5.5-10.3.27.

stpaultim commented 2 years ago

I'm inclined to make this a priority for the next bug fix release, because while it is not a consistent problem, it's a pretty bad first impression for folks evaluating BackdropCMS.

I go a long time without seeing this problem, then suddenly I see it again. I've seen it locally with Lando and on BackdropCMS.org demo sites. So, it's not a Lando problem.

It just happened to me now and I found these errors in log (same as @indigoxela):

image

image

argiepiano commented 2 years ago

I think the main hurdle here is that we don't have a reliable way to reproduce this problem. I've been trying different options locally and on tugboat without success.

One thing I did notice (which may or may not be related) is that, when I use Chrome to manually install a dev site locally that uses the same URL for a previous (deleted) dev installation, Chrome uses the disk cache to retrieve the three card images of the new site. Since Chrome has those images locally, it doesn't request them from the server, and therefore the server folder styles/card/public/field/images is not created and the images are not placed there. This folder stores the "card" styled images, and it's created the first time those styled images are requested by the browser.

The above does not actually reproduce the problem. If I visit the site with a different browser (or if I clear the browser cache), then the folder is created and the styled images are put there... so, this probably is not relevant.

From the error messages @stpaultim and @indigoxela posted above, it looks like the issue is that image_style_create_derivative() cannot create the folder where the styled cards should be placed. The question here is why that is happening. This function is invoked by image_style_deliver() if the styled file doesn't exist (when viewing the styled image for the first time).

argiepiano commented 2 years ago

Here's some progress. I've been able to reproduce the problem with a success rate of 90%.

To start with, be sure to clear the browser cache EVERY TIME before refreshing the page with the card images.

In Chrome you can do this at chrome://settings/clearBrowserData and leaving 'Cached images and files' checked (you can uncheck the rest of the options). Then click Clear data

Please test on a clean install.

  1. Go to files/styles in the install directory and REMOVE all folders inside. You'll need to remove these folders every time you want to reproduce the problem, so that Backdrop creates fresh styled files every time.
  2. Create a view of Content -> Cards. Display: fields
  3. Add the Image field and select any Image style (except the original image)
  4. Click "Auto preview"

This last step doesn't always work to reproduce the problem, but I've been able to reproduce it 90% of the time. If it doesn't work, you will need to

  1. clear the browser cache again,
  2. and remove the folder inside files/styles and try again.

After being able to reproduce this, I was curious to see what the problem was. The issue was the creation of the directory.

  1. In file_prepare_directory() remove the @ in front of @backdrop_mkdir(($directory, NULL, TRUE); . This will display any PHP errors when creating the directory
  2. Reproduce the problem again and check the dblog

I see the following error in the watchdog log:

Warning: mkdir(): File exists in BackdropLocalStreamWrapper->mkdir() (line 791 of /Users/alejandrocremaschi/Sites/localhost/bdentity/core/includes/stream_wrappers.inc).

This is very strange. The backdrop_mkdir() function is only reached after the if statement if(!is_dir($directory)) {}, which means that $directory is not a directory... but then we get the other error "File exists" thrown by mkdir.

I then inserted some watchdog statements in two places:

  1. watchdog('test prepare directory', $directory); as the first line of file_prepare_directory()
  2. watchdog('test create derivative', $source . ' ' . $directory); right after $directory = backdrop_dirname($destination); in function image_style_create_derivative()

The function file_prepare_directory() is invoked by image_style_create_derivative(), so you would expect to see the following sequence of watchdog messages (alternating calls):

For the first image:

For the second:

For the third:

But instead I'm getting 2, or sometimes 3 calls of test create derivative in a row, and then two or three calls to test prepare directory.

See this snapshot, for example. The first image to reach test create derivative is card-3, and the last one to reach test prepare directory is also card-3 (which fails since the directory has already been created).

Screen Shot 2022-08-31 at 7 30 36 PM

I don't know enough about server requests and threads, but it seems to me that my server is getting requests to serve the 3 images in a row, and processing them before finishing the process of the previous image. Does this make sense?

I have to stop now, but hopefully someone with more experience can pick it from here.

stpaultim commented 2 years ago

@argiepiano - Thanks for this progress. Hopefully this is clear, but we have until Sept 15th to get a bug fix into the next release. Tomorrow is only the deadline for new features to be ready. Anyway, this is not an overnight rush, but still something I hope we can get fixed soon.

I will not be having time to work on this for at least a few days.

argiepiano commented 2 years ago

Tested on tugboat. I was able to reproduce the problem, but the success rate (to reproduce) is smaller than in my local. (one out of 3). The issue with testing in Tugboat is that you can't remove the styled image folders, so you get three chances to test with the above (one for each image style in the View)

argiepiano commented 2 years ago

Reading a bit online. There are articles about race conditions with multiple mkdir. See for example this, which closely resembles the problem we have.

What I'm seeing in our case (most often when the problem happens):

There is a reversal of the backdrop_mkdir() call. The second call fails because the directory already exists and returns FALSE to the caller function, which interprets it as a failure to create the directory - thus the styled image for IMAGE 1 is not created.

I'm thinking that perhaps the Stackoverflow solution above may work here. Haven't tried it.

argiepiano commented 2 years ago

See also this article about this race condition

argiepiano commented 2 years ago

PR https://github.com/backdrop/backdrop/pull/4186 submitted. The fix is actually pretty simple. Instead of trusting the if statement and returning FALSE blindly, we do a second check to see if the directory has actually been created. The race condition still exists, but we check if the directory exists, which tells us whether the mkdir() failed because of permissions or other problems, or simply because the directory was already there.

indigoxela commented 2 years ago

Wow, great sleuthing! :tada: So that's a hidden problem that existed before and isn't actually directly related to the cards view.

I wonder, if there are more (hidden) race conditions with backdrop_mkdir() and multiple threads. I found some Drupal issues, mostly about functional tests running in parallel.

Didn't actually test the PR yet, but I do have a concern (PR comment). I hope that I'm not too pedantic.

argiepiano commented 2 years ago

Thanks @indigoxela. I'll address your comment soon.

I wanted to summarize the issue. I tend to write long explanations, so here's a short version for those who want to test/review.

The problem happens with the following lines in file_prepare_directory() when you have parallel processes racing to create a directory (for example when the browser requests the 3 card images for the first time):

  if (!is_dir($directory)) {
    if (($options & FILE_CREATE_DIRECTORY) && @backdrop_mkdir($directory, NULL, TRUE)) {
      return backdrop_chmod($directory);
    }
    return FALSE;
  }

In a nutshell, the issue is with the combination is is_dir() and backdrop_mkdir().

This is how the problem happens.

  1. Process 1 reaches if (!is_dir($directory)). Since the directory hasn't been created, it proceeds into the if statement
  2. Before process 1 reaches backdrop_mkdir(), process 2 reaches the if statement above. The directory doesn't yet exist, so it proceeds
  3. Process 2 reaches the backdrop_mkdir() BEFORE process 1 does so, and creates the directory
  4. Process 1 reaches the backdrop_mkdir(), which fails since the directory already exists (step 3), and returns FALSE

The PR basically changes the 4th step above. Instead of returning FALSE, it checks for the existence of the directory, and returns TRUE if it exists (telling the calling function that the directory has effectively been prepared).

argiepiano commented 2 years ago

@indigoxela, I updated the PR to address your comment. It now also checks if the directory is writable.

stpaultim commented 2 years ago

@klonos says that he experiences this problem every time he builds a new site on his local, so he would be a good person to test this PR. Hey @klonos

indigoxela commented 2 years ago

I updated the PR to address your comment...

That's great, many thanks. While looking at the code, ... when a new developer reads that, they may wonder, why this happens. A small code comment that this exists to prevent a race condition might help to avoid that someone tries to "fix this odd code". @argiepiano what do you think?

@klonos says that he experiences this problem every time ... so he would be a good person to test this PR.

That would be cool, because I have trouble to reproduce (only happens once in a while on my dev, even if I ruthlessly delete directories).

klonos commented 2 years ago

@argiepiano wow dude! ...awesome investigation/troubleshooting and research here ๐Ÿ‘๐Ÿผ ...and what a simple solution too! โค๏ธ

I have tested this by creating/destroying/recreating various instances of fresh Backdrop installations on my local (using Lando), and whereas before I could reproduce the problem 100% of the times, now every single time the issue was fixed! ๐Ÿ‘๐Ÿผ

This would be RTBC, but:

...when a new developer reads that, they may wonder, why this happens. A small code comment that this exists to prevent a race condition might help to avoid that someone tries to "fix this odd code".

@indigoxela you beat me to suggesting this ๐Ÿ™‚

@argiepiano can you please add an inline comment, either pointing to https://stackoverflow.com/questions/44322783/is-is-dir-unreliable-or-are-there-race-conditions-that-can-be-mitigated-here or this issue in our queue here? Thanks.

Since we now know that this known issue with php and the race condition exists, wouldn't we want to implement the if (!is_dir($dir) && !mkdir($dir) && !is_dir($dir)) { ... } thing that many people have reported that it works around the issue? (as odd as it may be) ...or are we already doing that? ...I mean this sorta kinda looks like it:

  // Check if directory exists.
  if (!is_dir($directory)) { // <-- this would be the first !is_dir($dir)
    // Let mkdir() recursively create directories and use the default directory
    // permissions.
    if (($options & FILE_CREATE_DIRECTORY) && @backdrop_mkdir($directory, NULL, TRUE)) { // <-- is this effectively achieving the same as !mkdir($dir)?
      return backdrop_chmod($directory);
    }
    return is_dir($directory) && is_writable($directory); // <-- and this here would be the last !is_dir($dir)
  }

Or am I reading things completely wrong there? ๐Ÿค”

herbdool commented 2 years ago

The code looks good now. I don't think we need to rewrite it completely at this point since this is just a bug fix. I'll add an inline comment linking to this issue in the code.

argiepiano commented 2 years ago

I agree about the comment line. @klonos, yes, the fix mentioned in those issues is actually adapted to the code in this situation - so yes, the return line is the equivalent to the second is_dir() mentioned in those online answer.

@herbdool already pushed the comment, so we are ready to go. Thanks everyone.

quicksketch commented 2 years ago

Great work here @argiepiano! Your explanation and reasoning seem spot-on to me as the most likely problem. Having read the explanation here and it's complexity, that seems that linking to this issue is the best kind of code comment we could have. There's just too much explanation required. I added just a tiny bit more explanation and wrapped the code comment at 80 characters.

Merged https://github.com/backdrop/backdrop/pull/4186 into 1.x and 1.22.x. Thanks @argiepiano, @stpaultim, @indigoxela, @BWPanda, @klonos, and @herbdool!

stpaultim commented 2 years ago

Huge thanks to @argiepiano for help on this one.

klonos commented 2 years ago

Indeed ...kudos @argiepiano ๐Ÿ™๐Ÿผ

stpaultim commented 1 year ago

I just spun up a new site using the DEV branch of Backdrop CMS and got this:

image

Is there any chance we did anything that might have caused a regression?

I will see if I can recreate this again later.

argiepiano commented 1 year ago

Can you open the dev tools of your browser and check the console output, to see if there is any error listed there?

argiepiano commented 1 year ago

I've tried the following, and am unable to reproduce what you are seeing

  1. Installed a dev version of Backdrop after clearing the browser image cache
  2. Installed a dev without having cleared the browser cache
  3. Create the view I used to originally reproduce the problem (see above)

None of these generated the problem. The correct images and folders were created normally inside files/styles as expected (this was the issue originally, when some of the styled images were not created because of the race condition in creating the folder that contained them).

So, I'm not sure what may have caused this. One guess is that you are using an old dev version of Backdrop that does not contain the fix. Can you try downloading the dev version again?

Can you try the steps I outlined in my comment to reproduce the problem and see if you see it again?

stpaultim commented 1 year ago

I spin up new sites locally using a script that clones the dev branch of the repo. My screenshot was a brand new site that I spun up yesterday using that repo.

I deleted the site today and ran the script again, this time I didn't have any problems.

I'll try spinning up a few more sites to see if I'm able to recreate it. Maybe it was a fluke?

I just wanted to pass along the experience in case anyone else is seeing it.

Atten: @argiepiano

argiepiano commented 1 year ago

In Zulip, @klonos said:

I've set up a demo of 1.x on my local, using the same process/scripts I always used before, and https://github.com/backdrop/backdrop-issues/issues/5631 happened again :thinking: ...if anyone else happens to notice that too, please raise a flag.

I'm looking at how we fixed this issue. The rationale was:

  1. When Backdrop loads images for the first time when visiting the Home page after a clean install, the directory /files/styles/card/public/field/image is created and the styled image is put here
  2. Since there are 3 images, there are 3 very quick server requests
  3. This creates a well-documented race condition, where one request creates the directory while the other is about to create it. This leads to backdrop_mkdir() to fail and return a FALSE, which prevents one of the images from being styled and saved in the new directory
  4. This is why one of the images appears broken - it doesn't exist in the expected directory.
  5. The solution was to do a check with is_dir() before returning FALSE, just to check if another request may have actually created the directory

So, apparently we are seeing a few (very rare) fails, as reported by @stpaultim and @klonos.

My guess is that:

  1. The fix included the following line: return is_dir($directory) && is_writable($directory);
  2. As you can see above, we are checking two things: is_dir() and is_writable().
  3. My guess is that we have a "double race condition" here. The other request has created the directory, BUT it hasn't yet made it writable (which happens in line 457)
  4. Therefore, the fix fails, since the is_dir($directory) produces TRUE, while is_writable($directory); produces FALSE

I believe the solution is to do some defensive coding and set the directory to writable again. Unlike backdrop_mkdir() (which fails if the directory already exists), backdrop_chmod() will not fail if the directory is already writable. My suggestion is to replace:

return is_dir($directory) && is_writable($directory);

with:

return is_dir($directory) && backdrop_chmod($directory);

I have no way to test this, as I'm not able to reproduce this problem with the current fix, but perhaps @klonos can throw some Lando magic and try to reproduce it reliably?

klonos commented 1 year ago

As I also mentioned in Zulip, I have no way to reproduce this consistently - it only happened once, and I cannot figure out what was different this time (I spin up and destroy multiple local Backdrop sites almost daily).

argiepiano commented 1 year ago

I've encountered this issue again:

  1. Spin a new site at https://backdropcms.org/demo
  2. Click "Visit your site" once the site is ready (don't click on the URL listed below). This will take you to the Dashboard
  3. Click the Home page
Screen Shot 2023-01-08 at 10 57 16 AM

The broken image changes.

I can't reproduce in my local (not even with the steps I originally was able to reproduce), so this is hard fix... but I'd suggest to make the changes I mentioned above

argiepiano commented 1 year ago

One of the things that make it difficult to test a fix is that, when you do a PR and github spins a new site for testing, you will access that site by going to the HOME page and then clicking Login. So, by visiting the Home page you are seeing the 3 images just fine - they don't fail.

The difference with spinning a new site at https://backdropcms.org/demo is that, when you click the button "Visit your site" you are being sent to the Dashboard as a logged in admin (for example, link: https://latest-v47nxi2nmqlwgf8xueaevg4kwn74vg9w.tugboatqa.com/user/reset/1/1673200607/Lyih6fMPGLZS9y32tgQMmIWMPh1uGiL2DbQyrd-hey4/login)

So, the only way to encounter this problem is by being directed FIRST to the dashboard of a demo site with /user/reset/1/PASS_HASH/login (which is a password reset link with immediate login).

I haven't figured out how to visit a tugboat trial site spun by a PR this way (in the URL, the PASS_HASH is needed - I need to figure out how to generate it)

klonos commented 1 year ago

So, the only way to encounter this problem is by being directed FIRST to the dashboard of a demo site...

But this happened to me on one of my local sites... and the way I saw it was because I navigated to the home page first/directly. So ๐Ÿคท๐Ÿผ ๐Ÿค”

argiepiano commented 1 year ago

But this happened to me on one of my local sites... and the way I saw it was because I navigated to the home page first/directly. So ๐Ÿคท๐Ÿผ ๐Ÿค”

Sorry, I meant: the only way I've been able to reproduce this problem now, is by spinning a site on tugboat and being directed to the Dashboard first, then clicking Home. If, instead, you go to the home page on a tugboat site and log in normally, this is not happening.

avpaderno commented 1 year ago

I was testing a PR using its preview site (which runs on Backdrop 1.25) and I noticed this error message in the log.

LOCATION https://pr4352-4gncpiqlvloaugf4xjv7kldgibfltghd.tugboatqa.com/files/styles/card/public/field/image/card3-organize_1.png REFERRER https://pr4352-4gncpiqlvloaugf4xjv7kldgibfltghd.tugboatqa.com/ MESSAGE Failed to create style directory:ย public://styles/card/public/field/image

stpaultim commented 1 year ago

I have also noticed this on occasion in the Backdrop CMS demo sites. But, I can not reliably recreate the issue. It only happens sporatically for me.

klonos commented 1 year ago

Ditto @stpaultim