WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.33k stars 4.12k forks source link

Post publish upload media dialog: race condition in uploads results in incorrect image replacements #64899

Closed sgomes closed 5 days ago

sgomes commented 2 weeks ago

Description

The issue

When publishing a post with external images, the upload media dialog in the post publish sidebar is shown. This dialog allows you to upload all the images in your post to the media library, and replace the image URLs in the post accordingly, the goal being to benefit from the various performance improvements that WordPress applies to media library images.

However, choosing to perform this upload can result in a race condition where some of the external images get replaced with the wrong local image (see screen recording).

The resulting attachments all have different IDs, but some of them end up pointing to the same file on disk.

The root cause

This took some effort to work out, but here's my understanding of the sequence of events:

The issue is that there is no locking mechanism used for the generated filenames, at any level. So if one process consults the filesystem in the time between another process generating the filename and actually creating the file, they'll both end up generating the same filename, and we'll end up with them pointing to the same file on disk, overwriting each other.

Potential fixes

I'll start by saying that I think we should fix this at the media upload dialog level, since even if a fix eventually lands in Core (e.g. by somehow using the DB to synchronise the filenames), this will still affect every prior version of WordPress and will thus still be a concern for a long time to come.

As such, I can think of several strategies to handle this in the media upload dialog:

Folks who may be interested

@Mamaduka , who's been looking at the media upload dialog with me and patiently helping me with code reviews @youknowriad , who's kindly been guiding me through the Gutenberg codebase @ellatrix , who likely has the most experience with the media upload dialog, having graciously contributed this great performance feature in the first place

I'm happy to work on a fix myself, but I'd love your opinions on the best approach to solving the issue!

Thank you! πŸ™

Step-by-step reproduction instructions

As a race condition, this can be difficult to reproduce. Some environments, such as the WordPress Playground / wp-now seem immune to it; I don't know the technical details there, but I suspect they don't have real concurrency and thus end up with a form of accidental, environment-level synchronisation.

This is how I'm able to reproduce, but your mileage may vary:

Screenshots, screen recording, code snippet

Each of the numbers shown is a different external .png image.

https://github.com/user-attachments/assets/522c8118-4b61-4571-93f9-865833762be2

Environment info

Please confirm that you have searched existing issues in the repo.

Please confirm that you have tested with all plugins deactivated except Gutenberg.

sgomes commented 1 week ago

Adding a few more folks who may have an opinion here: @ntsekouras @gziolo

Thank you! πŸ™

sgomes commented 1 week ago

FYI, I'm leaning towards generating a unique name for each upload on the client side, using a GUID. That seems like a reasonable thing to do, and it should fully resolve the issue.

ntsekouras commented 1 week ago

FYI, I'm leaning towards generating a unique name for each upload on the client side, using a GUID. That seems like a reasonable thing to do, and it should fully resolve the issue.

So right now when we upload from external link the name of the created file is always image? If generating the name client side avoids the WP generated name and resolves the problem, this looks like a good solution to me.

I'd also love some feedback from @swissspidy, since he's working a lot with media.

swissspidy commented 1 week ago

Thanks for looping me in!

I noticed the lack of proper names too when working on the import for just a single image, but didn't think much of it because I already solved it in my project anyway.

Forgot about the same functionality in the pre-publish panel for the whole post. I'll need to implement that in my plugin too :)

The problem is that no name is set at all. It just downloads the image as a Blob and doesn't explicitly set a name. Then image is just a fallback by the browser or WordPress.

In my case, I had solved this by using getFilename() from @wordpress/url to get a file name from the URL and then explicitly create a File object with a proper name. It falls back to unnamed if a file name can't be determined. Though here a UUID would be a more robust fallback.

So if you have https://example.com/some-awesome-image.jpeg in your blog post, it will be imported as some-awesome-image.jpeg. This greatly reduces the risk of race conditions and is also a nicer name than a UUID :). Additionally, my project limits the amount of parallel uploads through client-side thumbnail generation, further reducing that risk of collisions happening on the server.

A few related observations:

sgomes commented 1 week ago

Thank you for the feedback, everyone!

I'm happy to use getFilename() to get a meaningful filename instead, yes πŸ‘

We need to consider that it may still lead to collisions, however (/images/folder1/image.jpg, /images/folder2/image.jpg, /images/folder3/image.jpg). We could decide those are not much of a concern and move on, or we could try to be defensive about it on the client side in some way, in which case we'd need to go back to something like appending a GUID (for the cases in which we detect a name collision).

As for your comments, @swissspidy:

Why does this pre-publish upload dialog only support core/image? In my project I also cover core/media-text, core/cover, and so on. This is quite limiting. In my plugin I support all of those blocks too.

Implementing this is exactly how I came across this bug πŸ™‚ I have code sitting in a branch on my machine that will add support for core/media-text and core/cover, but I wanted to solve this issue first, lest it make things worse.

Please let me know if there are any other media-bearing core blocks I'm missing!

If I have multiple image blocks with the same image URL, this dialog downloads and uploads the same image multiple times.

I was hoping to implement this as a follow-up to the PR adding support for other blocks, but I can roll it all into one PR if you feel that's better.

In fact, I'd go one step further and say that ideally we'd also deduplicate images across posts, at least for things like theme images. If a user is fond of a theme pattern with an image and repeatedly uses it in their posts, they'll end up with a ton of copies in the media library if they upload every time. I'm not sure it's possible to implement this in a good way, however.

The MaybeUploadMedia component has a wrong name here

Happy to change that too, as a follow-up πŸ‘

sgomes commented 1 week ago

I decided to roll everything into one PR, since the changes were closely related in places: #65122. Hope this is okay!