WordPress / wordpress-importer

The WordPress Importer
https://wordpress.org/plugins/wordpress-importer/
GNU General Public License v2.0
78 stars 76 forks source link

Blank lines skipped in posts #94

Closed dsnyder0pc closed 3 years ago

dsnyder0pc commented 3 years ago

I've observed that when importing posts with (intentional) embedded blank lines, the posts show up in the editor with the blank lines removed. I'm wondering if this code change is what is causing that:

import xml file

Here's how this looks in the WordPress editor for "Free" customers on WordPress.com:

import xml file

As you can see, the blank lines are preserved. However, when I import the same content to a "Business" plan on WordPress.com, the blank lines appear to have been filtered out:

import xml file

My hypothesis is that, for blank lines, the PHP language evaluates $importline to False, causing the importer to skip over line 88. I have not tested this but may do so later this week. If so, I'll report back.

I'm in the process of importing 2,100 posts into a Business plan on WordPress.com, and the posts are basically unreadable with the missing whitespace. Thanks for having a look.

dsnyder0pc commented 3 years ago

In case the code sample is difficult to find from my hyperlink, it's these lines:

    87          if ( $in_multiline && $importline ) {
    88              $multiline_content .= $importline . "\n";
    89          }

The file is src/parsers/class-wxr-parser-regex.php

dsnyder0pc commented 3 years ago

After making this code change, the import works while preserving blank lines:

--- ./wp-content/plugins/wordpress-importer/parsers/class-wxr-parser-regex.php.dist 2021-01-31 21:29:35.318240822 +0000
+++ ./wp-content/plugins/wordpress-importer/parsers/class-wxr-parser-regex.php  2021-02-03 17:52:22.412001641 +0000
@@ -75,16 +75,19 @@
            // We don't want to have this line added to `$is_multiline` below.
            $importline        = '';
            $in_multiline      = $tag;
+           $is_tag_line       = true;

          } elseif ( false !== ( $pos = strpos( $importline, "</$tag>" ) ) ) {
            $in_multiline          = false;
            $multiline_content    .= trim( substr( $importline, 0, $pos ) );

            $this->{$handler[0]}[] = call_user_func( $handler[1], $multiline_content );
+         } else {
+           $is_tag_line       = false;
          }
        }

-       if ( $in_multiline && $importline ) {
+       if ( $in_multiline && ! $is_tag_line ) {
          $multiline_content .= $importline . "\n";
        }
      }
dsnyder0pc commented 3 years ago

@jrfnl - What do you think of this change? I could submit a PR if that would be helpful.

jrfnl commented 3 years ago

@dsnyder0pc As this seems to concern wordpress.com, your first point of contact should be Automattic. I have no clue whether this plugin is the importer they use or if they use a different or customized version. Let alone what the difference is between the importers on different versions of their platform, so I suggest you contact them about this.

dsnyder0pc commented 3 years ago

@jrfnl - Thanks for your reply. Based on my experience so far, it seems that WordPress.com does use this importer. The screenshots above were taken after my import to a Business plan on WordPress.com.

My workaround is to just sftp the modified file to WordPress.com, but I thought you might be interested in fixing this logic bug.

The $importline variable can evaluate to False in PHP both because it's explicitly set to an empty string and unintentionally, in the case of blank lines.

dsnyder0pc commented 3 years ago

I should add that I have been speaking to WordPress support, and I made them aware of the issue that I've opened up here and the workaround/fix I've proposed.

jrfnl commented 3 years ago

As I said before, without even knowing whether this plugin is used by wordpress.com, let alone which version(s) and what the difference is between their install on the Free versus the Business platform, there is nothing to base any changes on.

Basically without a lot more information, there is nothing to be done.

Oh and just FYI: I'm not the maintainer of this plugin, so pinging me by nickname or presuming anything about my interest in this plugin is kind of grating.

dsnyder0pc commented 3 years ago

My apologies. Who is the maintainer?

dsnyder0pc commented 3 years ago

@ocean90 - Can you help me find the owner of this plugin? I reached out to the most recent comitter, but apparently that was not a great idea. Thanks.

yoavf commented 3 years ago

@dsnyder0pc thanks for opening this issue and the PR!

Regarding the WordPress.com mention - I'll try to clarify as an employee of Automattic: on WordPress.com we run the latest released version of this core WP Importer for sites that have the ability to install plugins (=our business plan). For other sites, we use a different importer that has been optimized for our multisite install (and where the bug mentioned here does not appear to exist).

@dsnyder0pc I'll try to confirm this issue against core WordPress, and review the PR.