Automattic / wp-calypso

The JavaScript and API powered WordPress.com
https://developer.wordpress.com
GNU General Public License v2.0
12.41k stars 1.98k forks source link

Import: Images are sometimes not imported or backfilled #17191

Closed dllh closed 2 years ago

dllh commented 7 years ago

Sometimes, when an import wraps up, all of its images aren't imported. Other times, the images appear to be imported, but the references to them in their original paths are not updated to new paths at wordpress.com. When the source site is taken offline, this can result in broken images, and indeed in irrevocably lost images if the content owner doesn't have backups of the images.

Historically, we've had a series of scripts we could run to try to fix up as many of these cases as possible, but we've never properly solved the root cause. This is more of an API issue than a calypso issue, and indeed it'll affect wp-admin imports too. Maybe it makes sense to move it to trac, but my worry is that trac isn't as visible as this issue tracking system, so I'm starting here.

Unfortunately, the issue can be caused by a number of things, from transient failures in our async jobs system to downtime on the source site, to intentional throttling by source service providers. I am skeptical that we'll be able to develop consistently reproduction steps. This issue therfore is designed to collect reports so that we can at least get a sense of how frequently it's coming up, and hopefully to get the issue escalated.

Some related background: p195om-3uN-p2

Some recent reports:

thebud15 commented 7 years ago

Hey there! Rachel recommended that I mention this issue here as it looks like what my user is experiencing. Any tips would be greatly appreciated so I can give the user some type of reply soon! Thank you!

p1503466102000019-slack-triage

thebud15 commented 7 years ago

Just to expand on a few more of the details here... The images down the home page of [redacted] are not appearing, such as under Services. This seems to be related to an issue David N. brought to my attention regarding Photon requests pointing to the incorrect domain and path.

rachelmcr commented 7 years ago

We received a user report in 709826-zd-woothemes. The img src referenced the old URL after import; the issue was fixed manually with a script in this case (ref: p1506710605000247-slack-triage).

ghost commented 6 years ago

ticket is 767612-zd-woothemes

Moved from no-tar-sands.org to WordPress.com site notarsandsblog.wordpress.com. Is waiting to use the no-tar-sands.org domain here, since many images are still at the old host, such as the first image here: https://notarsandsblog.wordpress.com/2016/12/09/stay-vigilant-keep-standing-with-standing-rock/

Do we have scripts to both make sure all media gets moved here (user said "there were 430 items in the media library on the old site and only 173 imported to the new one" ) and then update all media links in their content itself?

@rachelmcr seems like you are the best person to ping - thanks for any help!

chad1008 commented 6 years ago

Adding a note for the above from @spncrb - the customer is eager to change the name servers over asap to keep emails online. I've asked for some more detail there to see what is happening today that might impact email, in case that would also take the images they need offline

brezocordero commented 6 years ago

Adding another note to @spncrb comment that their hosting expires next Tuesday and they need to have it fixed before that. I have asked them to import again only the posts from 2016 and check if the image on post https://notarsandsblog.wordpress.com/2016/12/09/stay-vigilant-keep-standing-with-standing-rock/ was moved over, in case that breaking the import helps.

jenlynnemc commented 6 years ago

Adding another note to 767612-zd. User is feeling stressed with their hosting running out tomorrow. Are we able to help before the hosting expires? @dllh Is there someone else we can ping, as I think these types of issues are handled by a different team now? Thanks

dllh commented 6 years ago

There's not another team handling the one-off fixes (we're not doing them anymore). We've shifted investigation of the underlying cause away from HG but I don't believe a team has picked it up yet.

jamiepalatnik commented 6 years ago

Adding this ticket: 789482-zen

SiobhyB commented 6 years ago

This came up again in 44039-hc.

Discussion in p195om-3BU#comment-15747 and p1511181091000127-slack-triage.

supernovia commented 6 years ago

884861-f

SiobhyB commented 6 years ago

This came up again in 824290-zen.

kevmarsden commented 6 years ago

Another report in p2EDhh-j6-p2

davoraltman commented 6 years ago

Another one where it seems not all the images got imported 825044-zen

ryansholin commented 6 years ago

Another ticket here, from a WordPress.com-to-WordPress.com import. 813456-zen

nellofn commented 6 years ago

Another ticket here in 825715-zen - As workaround for this user, I clicked on the blank image in the editor and the image displayed in Edit mode. After clicking Update, it added the image back to the Editor, so even when the reference to the image was wrong, it did pull the image after all.

kevmarsden commented 6 years ago

This issue occurred in 843085-zen. User downgraded from Atomic to Simple and the uppercase letters in the image paths were not converted to lowercase. I was able to resolve this issue with the post-import-backfill-attachments.php script.

kevmarsden commented 6 years ago

This issue occurred again during an downgrade from Atomic. p8yzl4-14y-p2

chad1008 commented 6 years ago

Updating from @nellofn's comment above, 825715-zen is now here: 885286-zen

Two questions:

  1. @kevmarsden are you able to use your super script powers on this one?
  2. When this import issue is fixed, do we expect it will retroactively backfill the sites that have already been affected? If not, could/should we have someone (or a list of someones) with a sandbox in place to monitor and fix these up as they come in for now?

I worry that we're collecting customer tickets here, and those customers are waiting for a fix that might not actually restore their images if it only addresses future imports.

kevmarsden commented 6 years ago

@chad1008

  1. I ran a script to update the image src that were pointing to ../wp-content/uploads. But there are still 100+ images that didn't get imported. I posted about it on p8yzl4-17p-p2

  2. We probably won't be able to retroactively backfill images, so it's best not to make any promises. In the meantime, I'm trying to figure out a temporary workflow.

rachelmcr commented 6 years ago

A user reported image URLs not backfilled correctly after an import when reverting from an Atomic to Simple site. It's resolved for this user; reporting here to track frequency.

Internal refs: 886658-zen, 1462823-hc, p2EDhh-jN-p2

kevmarsden commented 6 years ago

Another issue with an Atomic revert. I was able to fix it with a script from my sandbox. p8yzl4-18o-p2

gracie commented 6 years ago

Another issue here 885591-zen

Each media file failed to import, while everything else moved over. The original site is still live. @kevmarsden is this something you might be able to help with?

kevmarsden commented 6 years ago

Another site had an issue after reverting from Atomic. I was able to fix it with import scripts. Ticket: 903221-zen

gamebits commented 6 years ago

User reports that they imported three weeks of content from one WP.com site into another, existing WP.com site. The posts came over, but the media did not, and the images embedded in the imported posts are loading from the original site, which is still live. The user can manually upload what they say are 3,800 media items but would still need the links updated. @kevmarsden, could you take a look?

928324-zen & 1662819-hc

Stanjo84 commented 6 years ago

The user's media seem to have been imported but are showing empty on blog posts.

931967-zen

katinthehatsite commented 6 years ago

Another user came in chat and here is a follow up ticket: 950411-zen

The images are still pointing to a previous host and some images are not coming through as featured images

dcoleonline commented 6 years ago

950276-zen

Images in posts were still referencing the self-hosted site rather than being pulled into WordPress.com and attached to the posts.

SiobhyB commented 6 years ago

Again in 952255-zen and p2EDhh-l1-p2.

The images were imported but were referencing the older site. (This has been fixed now but adding to make sure it's tracked.)

benchilcote commented 6 years ago

1084856-zen

User is trying to migrate content from hoochiekoochie.skynetblogs.be (not a WordPress website) to hoochiekoochie.blog. They successfully imported the content but images did not import. I attempted a few other imports into a test website with no luck. The user's original website account will be shut down at the end of the year and they want to save the images.

@kevmarsden, would you be willing to take a look?

kevmarsden commented 6 years ago

@benchilcote I was able to import the images and update the references using a script from my sandbox (image-import.php). I updated the user in 1084856-zen

gemmagarner commented 6 years ago

Another issue in: https://en.forums.wordpress.com/topic/images-are-broken-lost/

rezzap commented 6 years ago

Another issue was reported here: 1126731-zen Replacing the URL manually works as the images do seem to have imported, but that might be quite a bit of work for the user. Would like to know if there's a way we could update this for them using the script @kevmarsden done in the above by Ben?

KokkieH commented 6 years ago

Another report in https://en.forums.wordpress.com/topic/some-photos-missing/ - images imported to library, but src URLs not rewritten.

SiobhyB commented 6 years ago

For prosperity: I've helped with the last two issues, but wasn't able to help with this one due to the way the image paths are formatted. ^

wynwin commented 6 years ago

Another one reported in chat here: 3989459-hc Can we please try to run the script to clean this up for the user?

Sent follow-up here: 1192901-zen Site: ischerzo.com

benchilcote commented 6 years ago

1192637-zen, 3987868-hc Mp3 filepaths did not update on an import from Hostgator to WordPress.com for https://wealthisnotmoney.wordpress.com. The filepaths were being used in the audio shortcode. @kevmarsden was able to repair them.

drwpcom commented 6 years ago

Issue reported in 1213806-t

The user has imported a site to WordPress.com. Gallery images are missing from blog posts:

Example Post on WordPress.org: https://www.whatevernevermind.com/shrinkwrapped-whatkatiedoes-neverending/

Example Post on WordPress.com: https://whatevernevermind339609687.wordpress.com/2009/01/27/shrinkwrapped-whatkatiedoes-neverending/

The image exists in the image library but has not been added to the gallery in the post.

@kevmarsden could you please take a look?

Update July 9, 2018: The user has deleted old posts on their page and are interested in moving back to Dreamhost so the issue no longer exists.

kevmarsden commented 6 years ago

While doing an Atomic revert, I needed to user the image-import.php script to import the images. 1263011-zen

SiobhyB commented 6 years ago

This came up in p2EDhh-se-p2 too. The user moved from .org to .com and, although the images were imported to the Media Library, the images still used the URL of the old site. I was able to run a script to update the URLs.

SiobhyB commented 6 years ago

Another case in p2EDhh-si-p2.

SiobhyB commented 6 years ago

Another issue in p2EDhh-ss-p2. This one was for an export/import between WordPress.com sites. The image URLs still referenced the old WordPress.com site.

drwpcom commented 6 years ago

Another issue with an import between WordPress sites - 1276607-zen.

jessestu commented 6 years ago

Another: 1297249-zen

pmciano commented 6 years ago

Another 294817-h

User reports missing photos after import (which she was able to re-import from Google Photos), and none of the photos are appropriately attached to her imported posts.

sashastone commented 6 years ago

Missing some photos after import, and image URLs in post reflect the old site: 1316685-zen

sophiegyo commented 6 years ago

Another report: 1336432-zen Some photos are missing from the import, but that's likely due to them not being in the XML file that was originally exported for some reason.

chad1008 commented 6 years ago

Another report: 1354776-zen

The media library is missing 11 of the original site's 79 images (possible they were just unattached images).

In addition, the src URLs are referencing the original site instead of the wp.com file locations.

Note: the domain name has been moved over, but there is a temporary domain active on the original self hosted site (please see the linked ticket for those URLs)

scosgro commented 6 years ago

Another report: 5959271-hc

Their images are showing up in their media library, but the image links are all pointed to their old (no longer available) .org file format - /wp-content/uploads/

As a note, do we have an easy resolution for this on simple sites? @kevmarsden, seeing you as a go-to ping here, so... Anything that can be done here?

Followup: 1371530-zen

sophiegyo commented 6 years ago

Another report: 1380258-zen

Image paths in blog posts are appearing as /wp-content/uploads instead of the WPcom path format. The images themselves appear to have been imported.