WordPress / wordpress-importer

The WordPress Importer
https://wordpress.org/plugins/wordpress-importer/
GNU General Public License v2.0
78 stars 76 forks source link

bug in WXR_Parser_Regex when parsing authors #144

Open pbiron opened 1 year ago

pbiron commented 1 year ago

The regex parser assumes that author info is all contained on a single line, when in practice the WP exporter outputs authors across multiple lines in the WXR.

For example, the exporter outputs

 <wp:author>
   <wp:author_id>7</wp:author_id>
   <wp:author_login>username</wp:author_login>
   <wp:author_email>user@example.com</wp:author_email>
   <wp:author_display_name><![CDATA[First Last]]></wp:author_display_name>
   <wp:author_first_name><![CDATA[First]]></wp:author_first_name>
   <wp:author_last_name><![CDATA[Last]]></wp:author_last_name>
 </wp:author>

whereas, the regex parser is expecting

<wp:author><wp:author_id>7</wp:author_id><wp:author_login>username</wp:author_login><wp:author_email>user@example.com</wp:author_email><wp:author_display_name><![CDATA[First Last]]></wp:author_display_name><wp:author_first_name><![CDATA[First]]></wp:author_first_name><wp:author_last_name><![CDATA[Last]]></wp:author_last_name></wp:author>

I've got a tentative fix, but need to test it some more before submitting a PR (which probably won't be until the weekend)