WordPress / wordpress-importer

The WordPress Importer
https://wordpress.org/plugins/wordpress-importer/
GNU General Public License v2.0
78 stars 76 forks source link

Post content didn't import properly due to parsing issue in get_tag #82

Open vabc3 opened 3 years ago

vabc3 commented 3 years ago

Exported XML via wordpress 5.1 could not be properly imported.

The imported content will always have ]> at end of post.

The xml is like:

<content:encoded>
\t\t<![CDATA[some stuff]]>
\t\t</content:encoded>

This is valid XML, but it will cause issue in current parsing logic: https://github.com/WordPress/wordpress-importer/blob/64e575ac5e0c91225dc2a4661c8208197f7dc5c1/src/parsers/class-wxr-parser-regex.php#L115-L118

'<![CDATA[' is not the heading content, here we need to 'trim' content inside the tag first.

A quick fix is to add \s* around (.*?) in preg_match.