WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.18k stars 4.06k forks source link

Spaces removed or changed to   when pasting HTML with links #32509

Open johngodley opened 3 years ago

johngodley commented 3 years ago

If you rich text copy some HTML from a web page and paste into Gutenberg it sometimes removes spaces, or changes them to  .

Edit - see follow-on comment for a better example

For example, if the web page contains this HTML:

<p><a href="https://apple.com">+something</a> <a href="https://pear.com">+somethingelse</a></p>

When pasted the console shows:

Received HTML:

 <meta charset='utf-8'><a href="https://apple.com/" style="box-sizing: border-box; border: 0px; font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-style: normal; font-weight: 400; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; text-decoration: underline;">+something</a><span style="color: rgb(0, 16, 28); font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span> </span></span><a href="https://pear.com/" style="box-sizing: border-box; border: 0px; font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-style: normal; font-weight: 400; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; text-decoration: underline; font-size: 16px; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">+somethingelse</a>

Received plain text:

 +something +somethingelse

Processed inline HTML:

 <a href="https://apple.com/">+something</a>&nbsp;<a href="https://pear.com/">+somethingelse</a>

Note that the space between the two links is now a &nbsp;.

If the HTML copied is:

<p><a href="https://apple.com">#abcdef</a> <a href="https://pear.com">#ghijk</a></p>

Then, when pasted, it becomes:

Received HTML:

 <meta charset='utf-8'><a href="https://apple.com/" style="box-sizing: border-box; border: 0px; font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-style: normal; font-weight: 400; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; text-decoration: underline; font-size: 16px; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">#abcdef</a><span style="color: rgb(0, 16, 28); font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span> </span></span><a href="https://pear.com/" style="box-sizing: border-box; border: 0px; font-family: inter, -apple-system, system-ui, blinkmacsystemfont, &quot;Segoe UI&quot;, Roboto, Oxygen-Sans, Ubuntu, Cantarell, &quot;Helvetica Neue&quot;, sans-serif; font-style: normal; font-weight: 400; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline; text-decoration: underline; font-size: 16px; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">#ghijk</a>

Received plain text:

 #abcdef #ghijk

Processed HTML piece:

 <p><a href="https://apple.com/">#abcdef</a><a href="https://pear.com/">#ghijk</a></p>

You will note that there is no space between the two links.

In my testing it seems that this problem occurs whenever the pasted text contains links, and any surrounding space is converted to &nbsp; or removed.

To clarify, this only happens when copying rich text (i.e. select and copy from a web page). Plain text is fine (select and copy the HTML code).

Tried with Gutenberg 10.8.0 RC 1 / Chrome 91.0.4472.77 / MacOS Big Sur

glendaviesnz commented 3 years ago

I wasn't able to replicate this on latest trunk in Chrome or Firefox

Screen Shot 2021-06-09 at 11 51 43 AM Screen Shot 2021-06-09 at 11 52 57 AM
johngodley commented 3 years ago

Thanks for looking @glendaviesnz. Did you copy rich text or plain text? Based on your console output I think you copied plain text. To be clear, this problem only exists when copying rich text. That is, you select and copy from a web page, not just the plain text HTML. I've edited the original issue to make this clearer.

After trying to create a better example I think it involves copying more than one paragraph containing inline elements - it's possible in my original tests above that I was accidentally doing that.

For example, if you copy the entire 'developing for gutenberg' section from the Gutenberg readme:

image

Then paste it into the editor (in visual mode):

Received HTML:

 <meta charset='utf-8'><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(36, 41, 46); font-family: -apple-system, system-ui, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Extending and customizing is at the heart of the WordPress platform, this is no different for the Gutenberg project. The editor and future products can be extended by third-party developers using plugins.</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(36, 41, 46); font-family: -apple-system, system-ui, &quot;Segoe UI&quot;, Helvetica, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">Review the<span> </span><a href="https://github.com/WordPress/gutenberg/blob/trunk/docs/getting-started/tutorials/create-block/README.md" style="box-sizing: border-box; background-color: transparent; color: var(--color-text-link); text-decoration: none;">Create a Block tutorial</a><span> </span>for the fastest way to get started extending the block editor. See the<span> </span><a href="https://developer.wordpress.org/block-editor/#develop-for-the-block-editor" rel="nofollow" style="box-sizing: border-box; background-color: transparent; color: var(--color-text-link); text-decoration: none;">Developer Documentation</a><span> </span>for extensive tutorials, documentation, and API references.</p>

 Received plain text:

 Extending and customizing is at the heart of the WordPress platform, this is no different for the Gutenberg project. The editor and future products can be extended by third-party developers using plugins.

Review the Create a Block tutorial for the fastest way to get started extending the block editor. See the Developer Documentation for extensive tutorials, documentation, and API references.

Processed HTML piece:

 <p>Extending and customizing is at the heart of the WordPress platform, this is no different for the Gutenberg project. The editor and future products can be extended by third-party developers using plugins.</p><p>Review the&nbsp;<a href="https://github.com/WordPress/gutenberg/blob/trunk/docs/getting-started/tutorials/create-block/README.md">Create a Block tutorial</a>&nbsp;for the fastest way to get started extending the block editor. See the&nbsp;<a href="https://developer.wordpress.org/block-editor/#develop-for-the-block-editor">Developer Documentation</a>&nbsp;for extensive tutorials, documentation, and API references.</p>

Note the 'received HTML' is the key part to copying rich text.

You should see the &nbsp; in the 'processed HTML piece'. Switching the editor to code mode then shows these around the links (and other inline HTML elements):

image

I've tried this in Chrome and Safari, and it also seems to occur in Gutenberg 10.5.4.

Interestingly, if you just copy a single paragraph the &nbsp; do appear in the console, but are removed from the actual content in the editor.

Processed inline HTML:

 Review the&nbsp;<a href="https://github.com/WordPress/gutenberg/blob/trunk/docs/getting-started/tutorials/create-block/README.md">Create a Block tutorial</a>&nbsp;for the fastest way to get started extending the block editor. See the&nbsp;<a href="https://developer.wordpress.org/block-editor/#develop-for-the-block-editor">Developer Documentation</a>&nbsp;for extensive tutorials, documentation, and API references.

image

I wonder if whatever cleanup routine runs is only running on the first block?

glendaviesnz commented 3 years ago

Did you copy rich text or plain text?

🤦 Doh, plain text, sorry - misread your initial description, will take another look with richtext html today.

glendaviesnz commented 3 years ago

I was able to replicate this with rich text html.

It appears that the spaces are converted to &nbsp; by the browser at https://github.com/WordPress/gutenberg/blob/trunk/packages/blocks/src/api/raw-handling/utils.js#L135, where a temp html doc is created and its innerHtml is replaced with the pasted html. Haven't worked out yet why in some instances they end up back as normal spaces in saved content.

Where the links are not wrapped in a parent paragraph the spaces are removed at https://github.com/WordPress/gutenberg/blob/trunk/packages/blocks/src/api/raw-handling/normalise-blocks.js#L57 - this is because at this point the links are handled as separate nodes and they are appended as individual nodes to a parent p element, so any surrounding space is dropped. It may be possible to modify this to maintain the surrounding structure better, but would probably be a reasonable undertaking to make sure all possible cases were covered/tested effectively.

skorasaurus commented 1 year ago

After trying to create a better example I think it involves copying more than one paragraph containing inline elements - it's possible in my original tests above that I was accidentally doing that.

I was encountering some inconsistencies of when nbsp were being processed into space and when they weren't; and I too determined that transforming the NBSP only happened when there was more than one line of text

As indicated in the video below; the <0xa0> (the ASCII representation of the NBSP); were being transformed into spaces in the code editor when the text was only one line, but when the text were multiple lines; the nbsp remained in the code editor

https://user-images.githubusercontent.com/955351/215526959-5f7e443d-38b2-4ccf-9cf6-f7a03ad0692e.mp4