Open stpaultim opened 2 years ago
The problem might be limited to Lando. Didn't see it in a Tugboat demo I just spun up, and also didn't see that on local installs (I'm not using Lando).
Let's wait a bit to see if someone confirms, but I'm confident everything's OK.
I just saw this in a non-lando deploy (I use MAMP and create my dev installations manually). In this case it was a view (preview). It did not show one of the images and got a 500 error (see image). Upon reloading the image shows up. Very hard to reproduce.
I saw the same thing yesterday on the front page in the same installation.
This has been extremely difficult to reproduce. The last time I saw this error was:
Upon reload of the page the error was gone and did not come back.
This happened on Chrome, MacOS Monterey. In total I've seen this 3 or 4 times in the last week.
I've seen issues previously with images that use an image style but when the image style hasn't generated the image yet. There's a function you can call to generate the image style, can't recall what it is ATM though. Perhaps that needs to be added to the profile...
I think its: image_style_create_derivative()
There's a function you can call to generate the image style
That sounds very promising. I don't know much about image styles. I did check the files folder and noticed that files/styles/large/public/field/image was missing one of the cards image files (I think the one that threw the error, but not sure). When I revisited the view, the missing card file was created automatically. So it may be possible that, under some circumstances, the file creation is only completed after the rendering, and somehow the server returns a 500 for that image that hasn't been fully moved to that folder.
Now I saw the problem, too!
In Dblog right after install:
Error: Failed to create style directory: public://styles/card/public/field/image
Notice: Unable to generate the derived image located at public://styles/card/public/field/image/card2-card_1.png
Reloading the home page seems to create everything missing and the problem is gone.
But looking at the images filenames card2-card_1.png
- note the "1" proves that something didn't work smoothly. It took several attempts to create images. Might be a race condition when creating the file directory for card images.
Yay... I guess. Was there a line number and function in the error?
Was there a line number and function in the error?
No line numbers or function names. Just two entries in dblog.
But if the creation of the file directory fails, it probably has something to do with standard_install() around line 421 (file_prepare_directory()).
I confirm that I see that happening for me too, although for me it's the opposite: the first 2 images/card tiles are not being rendered properly, whereas the last one is:
I can not provide reliable steps to reproduce. ... In Zulip @argiepiano mentioned having seen this problem before as well. But, if this is not a regular problem or even a frequent problem, I'm not sure if we should worry about it.
That has been consistent for me (both the problem itself, as well as the fact that only the last card tile is rendered properly). I have seen it enough times for me to raise #5742 (which I now have closed as a duplicate - thanks @olafgrabienski for pointing that out).
I believe that I have only seen this when spinning up a site with Lando. Could that be a factor?
Not sure if a factor, but I am also seeing this when using Lando.
Clearing the cache resolves the problem...
Upon reload of the page the error was gone...
Confirming that for me, all it takes is a page refresh to fix the issue.
...in case that matters, in my Lando configurations I'm using php 7.4 (7.4.29 to be exact) and apache 2.4.53, with MariaDB 5.5.5-10.3.27.
I'm inclined to make this a priority for the next bug fix release, because while it is not a consistent problem, it's a pretty bad first impression for folks evaluating BackdropCMS.
I go a long time without seeing this problem, then suddenly I see it again. I've seen it locally with Lando and on BackdropCMS.org demo sites. So, it's not a Lando problem.
It just happened to me now and I found these errors in log (same as @indigoxela):
I think the main hurdle here is that we don't have a reliable way to reproduce this problem. I've been trying different options locally and on tugboat without success.
One thing I did notice (which may or may not be related) is that, when I use Chrome to manually install a dev site locally that uses the same URL for a previous (deleted) dev installation, Chrome uses the disk cache to retrieve the three card images of the new site. Since Chrome has those images locally, it doesn't request them from the server, and therefore the server folder styles/card/public/field/images is not created and the images are not placed there. This folder stores the "card" styled images, and it's created the first time those styled images are requested by the browser.
The above does not actually reproduce the problem. If I visit the site with a different browser (or if I clear the browser cache), then the folder is created and the styled images are put there... so, this probably is not relevant.
From the error messages @stpaultim and @indigoxela posted above, it looks like the issue is that image_style_create_derivative()
cannot create the folder where the styled cards should be placed. The question here is why that is happening. This function is invoked by image_style_deliver()
if the styled file doesn't exist (when viewing the styled image for the first time).
Here's some progress. I've been able to reproduce the problem with a success rate of 90%.
To start with, be sure to clear the browser cache EVERY TIME before refreshing the page with the card images.
In Chrome you can do this at chrome://settings/clearBrowserData
and leaving 'Cached images and files' checked (you can uncheck the rest of the options). Then click Clear data
Please test on a clean install.
files/styles
in the install directory and REMOVE all folders inside. You'll need to remove these folders every time you want to reproduce the problem, so that Backdrop creates fresh styled files every time. This last step doesn't always work to reproduce the problem, but I've been able to reproduce it 90% of the time. If it doesn't work, you will need to
files/styles
and try again. After being able to reproduce this, I was curious to see what the problem was. The issue was the creation of the directory.
file_prepare_directory()
remove the @
in front of @backdrop_mkdir(($directory, NULL, TRUE);
. This will display any PHP errors when creating the directoryI see the following error in the watchdog log:
Warning: mkdir(): File exists in BackdropLocalStreamWrapper->mkdir() (line 791 of /Users/alejandrocremaschi/Sites/localhost/bdentity/core/includes/stream_wrappers.inc).
This is very strange. The backdrop_mkdir()
function is only reached after the if statement if(!is_dir($directory)) {}
, which means that $directory is not a directory... but then we get the other error "File exists" thrown by mkdir.
I then inserted some watchdog statements in two places:
watchdog('test prepare directory', $directory);
as the first line of file_prepare_directory()
watchdog('test create derivative', $source . ' ' . $directory);
right after $directory = backdrop_dirname($destination);
in function image_style_create_derivative()
The function file_prepare_directory()
is invoked by image_style_create_derivative()
, so you would expect to see the following sequence of watchdog messages (alternating calls):
For the first image:
test create derivative
(for the first image)test prepare directory
(for the first image)For the second:
test create derivative
(for the second image)test prepare directory
(for the second image)For the third:
test create derivative
(for the third image)test prepare directory
(for the third image)But instead I'm getting 2, or sometimes 3 calls of test create derivative
in a row, and then two or three calls to test prepare directory
.
See this snapshot, for example. The first image to reach test create derivative
is card-3, and the last one to reach test prepare directory
is also card-3 (which fails since the directory has already been created).
I don't know enough about server requests and threads, but it seems to me that my server is getting requests to serve the 3 images in a row, and processing them before finishing the process of the previous image. Does this make sense?
I have to stop now, but hopefully someone with more experience can pick it from here.
@argiepiano - Thanks for this progress. Hopefully this is clear, but we have until Sept 15th to get a bug fix into the next release. Tomorrow is only the deadline for new features to be ready. Anyway, this is not an overnight rush, but still something I hope we can get fixed soon.
I will not be having time to work on this for at least a few days.
Tested on tugboat. I was able to reproduce the problem, but the success rate (to reproduce) is smaller than in my local. (one out of 3). The issue with testing in Tugboat is that you can't remove the styled image folders, so you get three chances to test with the above (one for each image style in the View)
Reading a bit online. There are articles about race conditions with multiple mkdir. See for example this, which closely resembles the problem we have.
What I'm seeing in our case (most often when the problem happens):
image_style_create_derivative()
for image 1image_style_create_derivative()
for image 2file_prepare_directory()
for image 1file_prepare_directory()
for image 2backdrop_mkdir()
(line 456) for IMAGE 2 (notice the reversal)backdrop_mkdir()
(line 456) for IMAGE 1 -> FAILS and return FALSE to image_style_create_derivative()
There is a reversal of the backdrop_mkdir()
call. The second call fails because the directory already exists and returns FALSE to the caller function, which interprets it as a failure to create the directory - thus the styled image for IMAGE 1 is not created.
I'm thinking that perhaps the Stackoverflow solution above may work here. Haven't tried it.
See also this article about this race condition
PR https://github.com/backdrop/backdrop/pull/4186 submitted. The fix is actually pretty simple. Instead of trusting the if statement and returning FALSE blindly, we do a second check to see if the directory has actually been created. The race condition still exists, but we check if the directory exists, which tells us whether the mkdir()
failed because of permissions or other problems, or simply because the directory was already there.
Wow, great sleuthing! :tada: So that's a hidden problem that existed before and isn't actually directly related to the cards view.
I wonder, if there are more (hidden) race conditions with backdrop_mkdir() and multiple threads. I found some Drupal issues, mostly about functional tests running in parallel.
Didn't actually test the PR yet, but I do have a concern (PR comment). I hope that I'm not too pedantic.
Thanks @indigoxela. I'll address your comment soon.
I wanted to summarize the issue. I tend to write long explanations, so here's a short version for those who want to test/review.
The problem happens with the following lines in file_prepare_directory()
when you have parallel processes racing to create a directory (for example when the browser requests the 3 card images for the first time):
if (!is_dir($directory)) {
if (($options & FILE_CREATE_DIRECTORY) && @backdrop_mkdir($directory, NULL, TRUE)) {
return backdrop_chmod($directory);
}
return FALSE;
}
In a nutshell, the issue is with the combination is is_dir()
and backdrop_mkdir()
.
This is how the problem happens.
if (!is_dir($directory))
. Since the directory hasn't been created, it proceeds into the if statementbackdrop_mkdir()
, process 2 reaches the if statement above. The directory doesn't yet exist, so it proceedsbackdrop_mkdir()
BEFORE process 1 does so, and creates the directorybackdrop_mkdir()
, which fails since the directory already exists (step 3), and returns FALSEThe PR basically changes the 4th step above. Instead of returning FALSE, it checks for the existence of the directory, and returns TRUE if it exists (telling the calling function that the directory has effectively been prepared).
@indigoxela, I updated the PR to address your comment. It now also checks if the directory is writable.
@klonos says that he experiences this problem every time he builds a new site on his local, so he would be a good person to test this PR. Hey @klonos
I updated the PR to address your comment...
That's great, many thanks. While looking at the code, ... when a new developer reads that, they may wonder, why this happens. A small code comment that this exists to prevent a race condition might help to avoid that someone tries to "fix this odd code". @argiepiano what do you think?
@klonos says that he experiences this problem every time ... so he would be a good person to test this PR.
That would be cool, because I have trouble to reproduce (only happens once in a while on my dev, even if I ruthlessly delete directories).
@argiepiano wow dude! ...awesome investigation/troubleshooting and research here ๐๐ผ ...and what a simple solution too! โค๏ธ
I have tested this by creating/destroying/recreating various instances of fresh Backdrop installations on my local (using Lando), and whereas before I could reproduce the problem 100% of the times, now every single time the issue was fixed! ๐๐ผ
This would be RTBC, but:
...when a new developer reads that, they may wonder, why this happens. A small code comment that this exists to prevent a race condition might help to avoid that someone tries to "fix this odd code".
@indigoxela you beat me to suggesting this ๐
@argiepiano can you please add an inline comment, either pointing to https://stackoverflow.com/questions/44322783/is-is-dir-unreliable-or-are-there-race-conditions-that-can-be-mitigated-here or this issue in our queue here? Thanks.
Since we now know that this known issue with php and the race condition exists, wouldn't we want to implement the if (!is_dir($dir) && !mkdir($dir) && !is_dir($dir)) { ... }
thing that many people have reported that it works around the issue? (as odd as it may be) ...or are we already doing that? ...I mean this sorta kinda looks like it:
// Check if directory exists.
if (!is_dir($directory)) { // <-- this would be the first !is_dir($dir)
// Let mkdir() recursively create directories and use the default directory
// permissions.
if (($options & FILE_CREATE_DIRECTORY) && @backdrop_mkdir($directory, NULL, TRUE)) { // <-- is this effectively achieving the same as !mkdir($dir)?
return backdrop_chmod($directory);
}
return is_dir($directory) && is_writable($directory); // <-- and this here would be the last !is_dir($dir)
}
Or am I reading things completely wrong there? ๐ค
The code looks good now. I don't think we need to rewrite it completely at this point since this is just a bug fix. I'll add an inline comment linking to this issue in the code.
I agree about the comment line. @klonos, yes, the fix mentioned in those issues is actually adapted to the code in this situation - so yes, the return line is the equivalent to the second is_dir()
mentioned in those online answer.
@herbdool already pushed the comment, so we are ready to go. Thanks everyone.
Great work here @argiepiano! Your explanation and reasoning seem spot-on to me as the most likely problem. Having read the explanation here and it's complexity, that seems that linking to this issue is the best kind of code comment we could have. There's just too much explanation required. I added just a tiny bit more explanation and wrapped the code comment at 80 characters.
Merged https://github.com/backdrop/backdrop/pull/4186 into 1.x and 1.22.x. Thanks @argiepiano, @stpaultim, @indigoxela, @BWPanda, @klonos, and @herbdool!
Huge thanks to @argiepiano for help on this one.
Indeed ...kudos @argiepiano ๐๐ผ
I just spun up a new site using the DEV branch of Backdrop CMS and got this:
Is there any chance we did anything that might have caused a regression?
I will see if I can recreate this again later.
Can you open the dev tools of your browser and check the console output, to see if there is any error listed there?
I've tried the following, and am unable to reproduce what you are seeing
None of these generated the problem. The correct images and folders were created normally inside files/styles
as expected (this was the issue originally, when some of the styled images were not created because of the race condition in creating the folder that contained them).
So, I'm not sure what may have caused this. One guess is that you are using an old dev version of Backdrop that does not contain the fix. Can you try downloading the dev version again?
Can you try the steps I outlined in my comment to reproduce the problem and see if you see it again?
I spin up new sites locally using a script that clones the dev branch of the repo. My screenshot was a brand new site that I spun up yesterday using that repo.
I deleted the site today and ran the script again, this time I didn't have any problems.
I'll try spinning up a few more sites to see if I'm able to recreate it. Maybe it was a fluke?
I just wanted to pass along the experience in case anyone else is seeing it.
Atten: @argiepiano
In Zulip, @klonos said:
I've set up a demo of 1.x on my local, using the same process/scripts I always used before, and https://github.com/backdrop/backdrop-issues/issues/5631 happened again :thinking: ...if anyone else happens to notice that too, please raise a flag.
I'm looking at how we fixed this issue. The rationale was:
/files/styles/card/public/field/image
is created and the styled image is put herebackdrop_mkdir()
to fail and return a FALSE, which prevents one of the images from being styled and saved in the new directoryis_dir()
before returning FALSE, just to check if another request may have actually created the directory So, apparently we are seeing a few (very rare) fails, as reported by @stpaultim and @klonos.
My guess is that:
return is_dir($directory) && is_writable($directory);
is_dir()
and is_writable()
.is_dir($directory)
produces TRUE, while is_writable($directory);
produces FALSE I believe the solution is to do some defensive coding and set the directory to writable again. Unlike backdrop_mkdir()
(which fails if the directory already exists), backdrop_chmod()
will not fail if the directory is already writable. My suggestion is to replace:
return is_dir($directory) && is_writable($directory);
with:
return is_dir($directory) && backdrop_chmod($directory);
I have no way to test this, as I'm not able to reproduce this problem with the current fix, but perhaps @klonos can throw some Lando magic and try to reproduce it reliably?
As I also mentioned in Zulip, I have no way to reproduce this consistently - it only happened once, and I cannot figure out what was different this time (I spin up and destroy multiple local Backdrop sites almost daily).
I've encountered this issue again:
The broken image changes.
I can't reproduce in my local (not even with the steps I originally was able to reproduce), so this is hard fix... but I'd suggest to make the changes I mentioned above
One of the things that make it difficult to test a fix is that, when you do a PR and github spins a new site for testing, you will access that site by going to the HOME page and then clicking Login. So, by visiting the Home page you are seeing the 3 images just fine - they don't fail.
The difference with spinning a new site at https://backdropcms.org/demo is that, when you click the button "Visit your site" you are being sent to the Dashboard as a logged in admin (for example, link: https://latest-v47nxi2nmqlwgf8xueaevg4kwn74vg9w.tugboatqa.com/user/reset/1/1673200607/Lyih6fMPGLZS9y32tgQMmIWMPh1uGiL2DbQyrd-hey4/login
)
So, the only way to encounter this problem is by being directed FIRST to the dashboard of a demo site with /user/reset/1/PASS_HASH/login
(which is a password reset link with immediate login).
I haven't figured out how to visit a tugboat trial site spun by a PR this way (in the URL, the PASS_HASH is needed - I need to figure out how to generate it)
So, the only way to encounter this problem is by being directed FIRST to the dashboard of a demo site...
But this happened to me on one of my local sites... and the way I saw it was because I navigated to the home page first/directly. So ๐คท๐ผ ๐ค
But this happened to me on one of my local sites... and the way I saw it was because I navigated to the home page first/directly. So ๐คท๐ผ ๐ค
Sorry, I meant: the only way I've been able to reproduce this problem now, is by spinning a site on tugboat and being directed to the Dashboard first, then clicking Home. If, instead, you go to the home page on a tugboat site and log in normally, this is not happening.
I was testing a PR using its preview site (which runs on Backdrop 1.25) and I noticed this error message in the log.
LOCATION https://pr4352-4gncpiqlvloaugf4xjv7kldgibfltghd.tugboatqa.com/files/styles/card/public/field/image/card3-organize_1.png REFERRER https://pr4352-4gncpiqlvloaugf4xjv7kldgibfltghd.tugboatqa.com/ MESSAGE Failed to create style directory:ย public://styles/card/public/field/image
I have also noticed this on occasion in the Backdrop CMS demo sites. But, I can not reliably recreate the issue. It only happens sporatically for me.
Ditto @stpaultim
Description of the bug
Occasionally after installing a new site the front page does not render properly. See:
Clearing the cache resolves the problem AND I'm quite sure that it doesn't happen every time.
NOTE: This occurred when spinning up the site in Lando. Immediately afterwards, I destroyed the site and rebuilt it and it loaded as expected No problem.
Steps To Reproduce
I can not provide reliable steps to reproduce.
Proposed Solutions
Do we need to add another clearing of the cache to the install process. It seems like maybe something is not loading fast enough, under some uncertain conditions.
WILD SPECULATION: The fact that the third image is not displaying properly suggests to me that maybe the page is rendering before that image is ready to be rendered.
Additional Information
I believe that I have only seen this when spinning up a site with Lando. Could that be a factor?
In Zulip @argiepiano mentioned having seen this problem before as well. But, if this is not a regular problem or even a frequent problem, I'm not sure if we should worry about it.