WordPress / wordpress-importer

The WordPress Importer
https://wordpress.org/plugins/wordpress-importer/
GNU General Public License v2.0
78 stars 75 forks source link

remapped image urls in block attributes will not correctly replaced #101

Open lgersman opened 3 years ago

lgersman commented 3 years ago

Hi guys,

if you import a page/post with - let's say a wp:cover block - the image attribute of the block will not be replaced since block attributes are json escaped.

An example:

<!-- wp:cover {"url":"http:\/\/example.org\/wp-content\/uploads\/2021\/06\/zdf-hitparade.jpg","id":5,} -->
<div class="wp-block-cover has-background-dim">
  <img class="wp-block-cover__image-background wp-image-5" alt="" src="http://example.org/wp-content/uploads/2021/06/zdf-hitparade.jpg" data-object-fit="cover"/>
  <div class="wp-block-cover__inner-container">
    <!-- wp:paragraph {"align":"center","placeholder":"Write title\u2026","textColor":"white","className":"","fontSize":"large"} -->
      <p class="has-text-align-center has-white-color has-text-color has-large-font-size">WOW</p>
    <!--/wp:paragraph -->
  </div>
</div>
<!-- /wp:cover -->

In case attachment import was enabled AND an image zdf-hitparade.jpg already exists locally, the importer will create a new attachment for the image and will rename/store the image attachment to zdf-hitparade-1.jpg.

At the end of the attachment import the posts will be processed to replace the old reference to the image with new image url (http://example.org/wp-content/uploads/2021/06/zdf-hitparade.jpg => http://example.org/wp-content/uploads/2021/06/zdf-hitparade-1.jpg).

this works fine for the <img> element, but not for the wp:cover url attribute since its json escaped. To fix this you simple change the code at https://github.com/WordPress/wordpress-importer/blob/e05f678835c60030ca23c9a186f50999e198a360/src/class-wp-import.php#L1271 from

$wpdb->query( $wpdb->prepare( "UPDATE {$wpdb->posts} SET post_content = REPLACE(post_content, %s, %s)", $from_url, $to_url ) );

to

$wpdb->query($wpdb->prepare("UPDATE {$wpdb->posts} SET post_content = REPLACE( REPLACE(post_content, %s, %s), %s, %s)", $from_url, $to_url, json_encode($from_url), json_encode($to_url)));

and everything works like a charm.

Kind regards and have a nice weekend,

Lars

joyously commented 3 years ago

With that change, wouldn't any existing plain URLs be unchanged?

lgersman commented 3 years ago

Both plain and escaped variant will be changed by the suggested fix.

As you might have seen in the suggested change, the new $to_url and it's escaped variant will be replaced.

Without that change, NO page/post containing a Gutenberg block with a image url attribute will display wrong data in Gutenberg (since the block attributes will not be changed with the current replacement code) after being imported.

joyously commented 3 years ago

So, classic HTML (or legacy HTML), of which there are at least 15 years worth, would not be replaced correctly?

lgersman commented 3 years ago

That's not what I am said.

The current implemention is anyway kinda brute force and will replace ALL old image refs with the new ones - not only for the freshly imported pages/posts. So it possibly breaks existing content anyway.

What I reported is that the current implementation does not result in clean normalized pages/posts for content containing wp:cover and friends.

It might be possible that my proposed fix may break 15 year old content under some circumstances. On the other side : the fix would make the import compatible with Gutenburg. Decide yourself :-)