Open pbiron opened 1 year ago
The regex parser assumes that author info is all contained on a single line, when in practice the WP exporter outputs authors across multiple lines in the WXR.
For example, the exporter outputs
<wp:author> <wp:author_id>7</wp:author_id> <wp:author_login>username</wp:author_login> <wp:author_email>user@example.com</wp:author_email> <wp:author_display_name><![CDATA[First Last]]></wp:author_display_name> <wp:author_first_name><![CDATA[First]]></wp:author_first_name> <wp:author_last_name><![CDATA[Last]]></wp:author_last_name> </wp:author>
whereas, the regex parser is expecting
<wp:author><wp:author_id>7</wp:author_id><wp:author_login>username</wp:author_login><wp:author_email>user@example.com</wp:author_email><wp:author_display_name><![CDATA[First Last]]></wp:author_display_name><wp:author_first_name><![CDATA[First]]></wp:author_first_name><wp:author_last_name><![CDATA[Last]]></wp:author_last_name></wp:author>
I've got a tentative fix, but need to test it some more before submitting a PR (which probably won't be until the weekend)
The regex parser assumes that author info is all contained on a single line, when in practice the WP exporter outputs authors across multiple lines in the WXR.
For example, the exporter outputs
whereas, the regex parser is expecting
I've got a tentative fix, but need to test it some more before submitting a PR (which probably won't be until the weekend)