WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.51k stars 4.2k forks source link

Issues when pasting into post title field #38637

Closed claudiulodro closed 2 years ago

claudiulodro commented 2 years ago

Description

While doing some in-depth testing around copy + pasting into the post title field of the editor, I noticed a few issues:

  1. When pasting headings with a lot of formatting from Google Docs into the post title field, a bunch of the original formatting is initially displayed in the editor. This is really just a visual issue, as it doesn't cause issues on the frontend and goes away when a post is re-visited in the editor.
  2. When pasting strings with certain characters into the post title field, it's possible to generate permalinks that can cause redirection issues or crash browsers.
  3. When pasting a full line heading from Google Docs into the post title field in Safari, certain headings crash the editor and the whole browser.

Step-by-step reproduction instructions

To reproduce issue 1 around pasting formatted content from Google Docs into the title field:

  1. Create a Google Doc. Add some headings with a bunch of formatting to the doc.
  2. Copy one of the headings. Paste it into the title field of the editor. Observe much of the styling carries over (color, sometimes size, sometimes font weight, etc.)
  3. Save post. Re-open editor. Observe post title doesn't have all that extra formatting applied and looks normal. This screen recording demonstrates this issue well: gutenbergpastetitle1

To reproduce issue 2 around permalink generation when certain characters are in the title:

  1. Copy this string, including the character at the end which displays as [obj] or an empty space, depending on your browser:

Orange County Cities Continue Grappling With State Mandated Housing Goals This Week

  1. Paste it into the title field. Publish the post. The generated permalink and title will look like this in different browsers:

Chrome:

Screen Shot 2022-02-08 at 11 36 58 AM

Safari:

Screen Shot 2022-02-08 at 11 38 50 AM Screen Shot 2022-02-08 at 11 38 57 AM

The presence of this character in URLs seems to cause difficulties for some users' browsers, and can crash the tab when a user visits a link with the character in it. I am unsure how to get that character "naturally" in the clipboard, but it seems to happen when copy + pasting from something (I've encountered it on a number of sites I manage following the WP 5.9 release).

To reproduce issue 3 around the Safari crashing when a heading is pasted into the title field:

  1. This doesn't seem to happen with every Google Doc, but it does happen with many in my testing. In Safari, copy the full heading line to the clipboard and paste it into the post title field. Observe the browser crashes. In the below screen recording, I copy the heading line, paste it, Safari crashes and reloads, I paste it again, Safari crashes and doesn't reload: gutenbergtitlepaste2

Screenshots, screen recording, code snippet

See steps to reproduce

Environment info

I tested on a clean site running only WP 5.9 (using the version of Gutenberg that ships with WP 5.9). Theme was Twenty Twenty Two, but I can reproduce with other themes. I tested with Safari and Chrome. Device was a MacBook Pro.

Please confirm that you have searched existing issues in the repo.

Yes

Please confirm that you have tested with all plugins deactivated except Gutenberg.

Yes

gwwar commented 2 years ago

@claudiulodro thanks for the report! Do you happen to have any extra plugins running? I'm having trouble reproducing issues 2/3 on 5.9

We use the object replacement character internally in the rich text package, so some havoc can occur if it slips in via paste. Most commonly folks were copy pasting in MS Word.

We previously worked around this in https://github.com/WordPress/gutenberg/pull/34851

claudiulodro commented 2 years ago

Thanks for the reply! That's good info. I reproduced these on a clean site with no extra plugins, specifically for testing these issues, but I'm sure it does probably vary depending on browser version, OS, when a Google Doc was written (I bet the internal markup changes occasionally), etc.

For item 2 specifically, I've been receiving a number of reports and noticing it across many of the sites we host following WP 5.9. In talking to the customers, it does appear that the one thing they have in common is that they write the posts in an external editor and then paste them into WP, so MS Word seems like a good theory. I'll continue collecting data as I encounter the issue and follow up if I get more concrete facts. :)

gwwar commented 2 years ago

Chatting with @claudiulodro this might require Safari 14 to reproduce. I do see some slightly different behavior on Safari 14 vs 15 but I'm still having trouble reproducing the wrong permalink url / safari crash.

@claudiulodro if you can consistently reproduce this, would you be interested in proposing a patch? I'm happy to help review.

Robertght commented 2 years ago

I experienced this recently as well, but the text was copied from a page created with Gutenberg and added that inside a post later.

In this case, it only appears in Firefox.

gwwar commented 2 years ago

Hmm, I still didn't have luck reproducing on Firefox. I'll maybe try to bulletproof the slug creation logic to remove the object replacement character, and ask y'all to test in a bit.

annezazu commented 2 years ago

I can replicate this repeatedly in Chrome when pasting into the Post Title. Most recently could replicate on the Make Network when writing a post for Make Test.

gwwar commented 2 years ago

@annezazu do you see the same behavior in this branch? https://github.com/WordPress/gutenberg/pull/39033

fringillas commented 2 years ago

I have a similar situation on WP 5.9.1. If you paste from MS Word into the post title on a mac in chrome, you will get a "?" at the end of the title when the draft is saved. The styling from MS Word is visible in WP until the draft is saved, after saving the normal post title styling is visible plus the added "?".

The permalink is also affected, and gets a ""-char at the end. I can't really replicate this permalink-behavior, it only seem to happen sometimes.

claudiulodro commented 2 years ago

Adding a few more data points: I had a 3 more reports of the  issue yesterday. In all cases, the users were pasting the title from Google Docs or MS Word into the post title field.

eduardogoncalves commented 2 years ago

Hello, I'm having the same issue here. I noticed this is happening when I'm blogging on macOS. When I copy a text, like the title of an article (https://g1.globo.com/mt/mato-grosso/noticia/2022/03/08/mulher-que-matou-amiga-com-facada-no-peito-em-mt-e-condenada-a-10-anos-de-prisao.ghtml) and paste it into my post title, on macOs it doesn't show any char/space at the end of the string. But when I open it on Windows machine, it displays an [obj] char at the end.

In the original site title it doesn't show any special char, it looks like gutenberg is adding it. image

gwwar commented 2 years ago

@dmsnell would you be available to help keep an eye on this one, since I'll be less available to contribute as often? From the reports, it's highly likely that there's still an issue here, but the tricky part is being able to reproduce the issue. We're likely missing environment/browser details or additional steps.

dmsnell commented 2 years ago
  1. When pasting headings with a lot of formatting from Google Docs into the post title field, a bunch of the original formatting is initially displayed in the editor

I was able to reproduce this and believe this probably relates to the interaction with the paste handler on the title. The title block I think is stripping away the formatting by design (because the title can't have any formatting), but we still paste the HTML contents of the clipboard in instead of the plaintext contents.

If I can find some time I will confirm this and see if it's an easy fix.

  1. When pasting strings with certain characters into the post title field, it's possible to generate permalinks that can cause redirection issues or crash browsers.

I was unable to reproduce this but I'm running macOS. I tried in Firefox, Chromium, and Safari.

  1. When pasting a full line heading from Google Docs into the post title field in Safari, certain headings crash the editor and the whole browser.

Was able to reproduce this but had some trouble figuring out where the error is because of how React error boundaries are swallowing them up. It seems like an image might trigger it. The copy contents of the offending HTML follow

Offending HTML

The title

And a link!

Surprisingly it doesn't appear to crash if I'm pasting over an existing title. It only crashes if the title is empty before pasting.


I can try and keep an eye on this but I'm not sure how readily I'll be able to start tackling it. If I don't report back in a week it probably means I'm too occupied with other work to look. Feel free to re-ping me, especially if anyone has ideas for a fix.

michaelmuniz commented 2 years ago

We've observed this as well, and typically find it is the result of copy/pasting text from Word into the post title field. Thankfully it is not crashing the editor (likely the result of the previous fixes), but is there a way to suppress a trailing space or other non-displayable characters in the post title field?

Example Post: https://cpj.org/thetorch/2022/03/the-russian-media-is-dead/

We find the object replacement character shows up on Firefox and Android, but not Chrome, Safar, or iOS.

Interestingly enough, we also see this displaying on the twitter card when sharing out to social media (any browser), ie: https://cpj.org/thetorch/2022/03/the-russian-media-is-dead/?share=twitter&nb=1

eduardogoncalves commented 2 years ago

@michaelmuniz on Windows it shows up on Firefox and Chrome.

Firefox

image

Chrome 99.0.4844.51

image

JuanDBB commented 2 years ago

Hello This is a big problem for a newspaper, where journalists copy/paste titles from word and similars. We have this problem since wordpress 5.9. You can see the OBJ even in the permalink.

ckeeney commented 2 years ago

Adding this filter will replace the [obj] in the post slug with some alphanumerics characters.

In my test cases, post-slug[obj] became post-slugefbfbc, and in my test case the url encoding of [obj] was %ef%bf%bc. This is obviously less than ideal, but far better than having the non-printable characters in the url.

add_filter("wp_unique_post_slug", function($slug, $post_ID, $post_status, $post_type, $post_parent, $original_slug) {
    return preg_replace('/[^\w-]/', '', $slug);
}, 10, 6);
digitalsutton commented 2 years ago

Any update on this? Copy/pasting from word or Google is incredibly common. Yes, we can try to "train" our content creators, but I think it's fair to expect that title text to be properly filtered to avoid malformed characters appearing to users.

Does anybody have a workaround they can share? Thank you!

staceypee commented 2 years ago

We're experiencing this as well since updating to 5.9, and it's driving us a little crazy as it wreaks havoc on our permalink structure. Thanks @ckeeney for the filter which I will try as a stopgap.

edemir206 commented 2 years ago

Hello,

We are facing the same issue at www.ufsm.br

Our urls examples: https://www.ufsm.br/midias/arco/pesquisa-de-dida-larruscain-aborda-saberes-profissionais-de-musicodocentes-%ef%bf%bc/

https://www.ufsm.br/unidades-universitarias/ct/2022/05/06/depg-divulga-resultado-da-selecao-de-monitores%ef%bf%bc/

image

Our users are complaining since last week but I don't know in fact how many urls could be affected by now.

They reported that are copying pasting titles from external editors into gutenberg,

Our WP version is 5.9.3

Hope a soon fix.

rfischmann commented 2 years ago

I don't know if this is related to the "OBJ" issue, nor if the "OBJ" issue has been fixed on WordPress 6.0, but I'm running 6.0 and it still maintains another bug when copy and pasting using Firefox.

Even if I copy from the title itself, if I paste normally (and not using the special keyboard shortcut to paste in plain text), it breaks the whole editor layout:

title

dmsnell commented 2 years ago

can someone who is able to reproduce this please copy the offending content from Word and paste it into my clipboard viewer and then paste the full contents of that page here into this issue? you can inspect the source code to verify that nothing nefarious is going on in that code.

https://user-images.githubusercontent.com/5431237/172669977-a8fc25a7-b4da-4a3f-8336-0c7eedec261e.mov

dmsnell commented 2 years ago

Finally I've been able to figure out some reproducibility steps and I have to say I'm no longer surprised on why this was so hard to figure out. I suspect at this point we only get into this situation after deleting an existing title and then pasting; if you can confirm that you have seen this bug without hitting backspace, delete, or pasting over existing content I'd like to know.

Some interesting bits:

Reproduction steps

https://user-images.githubusercontent.com/5431237/176912997-865339ab-c7aa-42b6-82e7-3dfae93e3cea.mov

The extra space in the video is <br class=\"Apple-interchange-newline\"> inserted after the title. We can see them in the contentEditable inside the title's RichText

Screen Shot 2022-07-01 at 4 24 30 PM
ironprogrammer commented 2 years ago

Thanks for the repro notes and video, @dmsnell! To add to this, I've found that the browser used has a big impact on the reproducibility. First, my environment details:

Note to Readers: Viewing this issue in Firefox or Chrome may more clearly show the character (aka "[OBJ]") being referenced when pasted inline.

Reproduction Test Results

Observations

  1. In both Safari and Chrome it appears that the \n from the paste content correlates to the character being saved to the title. On the resultant post, this character is visible as a blank space in Chrome, and collapsed visually in Safari (hidden). When post titles including this character are viewed in Firefox, the character appears visually similar to [OBJ].
  2. Firefox and Safari reflect the pasted content's HTML styling on the title field when the \n is present (as documented by @claudiulodro), but Chrome does not.
  3. In each test case, if the \n was ommited from the copied text, then the issue was not reproducible.

As an aside, I was able to "naturally" reproduce the related slug issue discussed here, as well as over on Trac 55117. This was possible using Chrome and performing a "double paste" of the pasteboard with the sample text containing \n. The additional paste has to occur prior to saving/publishing the post, otherwise the previous good slug is retained. I'll update these findings over in Trac.

Props @dmsnell for the Clipboard Viewer, which proved immensely helpful in understanding what is actually in the clipboard from various sources.

ironprogrammer commented 2 years ago

I wanted to share additional repro steps from the related Core ticket, Trac 55117#comment:29 "Additional Information".

Key Takeaways: The browser used and how the cursor is placed/moved into the title field matters in how this issue is reproduced.

danielcostadev commented 2 years ago

Solution: Windows (Chrome) CTRL + Shift + V (Paste as plain text)

Rafiozoo commented 2 years ago

Adding this filter will replace the [obj] in the post slug with some alphanumerics characters.

In my test cases, post-slug[obj] became post-slugefbfbc, and in my test case the url encoding of [obj] was %ef%bf%bc. This is obviously less than ideal, but far better than having the non-printable characters in the url.

add_filter("wp_unique_post_slug", function($slug, $post_ID, $post_status, $post_type, $post_parent, $original_slug) {
    return preg_replace('/[^\w-]/', '', $slug);
}, 10, 6);

@ckeeney Thanks! It helps for the slug. Just checked in WP 6.0.1 that after normal post edit / save the filter cleans the slug well. But in quick edit replaces [OBJ] into "efbfbc" string.

The code below works for me:

add_filter("wp_unique_post_slug", function($slug, $post_ID, $post_status, $post_type, $post_parent, $original_slug) {
    return preg_replace('/(%ef%bf%bc)|(efbfbc)|[^\w-]/', '', $slug);
}, 10, 6);
dmsnell commented 2 years ago

@Rafiozoo this should be fixed since the merge of #42321 - are you still seeing it in new pastes or is the sin an old post?

Note too that changing the URL-encoding into efbfbc is probably neither a helpful change nor the easiest. Removing it altogether I think would lead to a better result wouldn't it?

swinggraphics commented 2 years ago

Why is this issue closed? The problem still exists in 6.0.2 whether you use the block editor or Quick Edit.

Solution: Windows (Chrome) CTRL + Shift + V (Paste as plain text)

This is a good workaround but not a solution.

ironprogrammer commented 2 years ago

In response to @swinggraphics:

Why is this issue closed?

It was addressed in #42321, which shipped with Gutenberg 13.8. In the timeline that coincidentally appeared just below the workaround suggestion, but they're unrelated 😂

Today's beta release of WordPress 6.1 includes this fix. Alternatively, the fix is also included in the Gutenberg plugin since 13.8. If you still encounter the issue after updating with either of these options, please share your experience and environment information here.

swinggraphics commented 2 years ago

Today's beta release of WordPress 6.1 includes this fix. Alternatively, the fix is also included in the Gutenberg plugin since 13.8. If you still encounter the issue after updating with either of these options, please share your experience and environment information here.

Gotcha, thank you! The line drawn between replies can be misleading…at least to me. :) I am helping out on a site where the authors run into this constantly. After fixing another half dozen for them today, we'll all be very glad when the fix ships.

coreyworrell commented 1 year ago

I'm still noticing <strong> tags when pasting text into the title field, in 6.1.1. These get saved to the database.

bozzmedia commented 1 year ago

I'm running into this issue on 6.1.1 when pasting linked text which is copied from the frontend of the website itself. The only way to see the source in the title is to look at All Posts to see the inserted HTML. I have had the issue with <strong> but now also <A href="/">

IMO these html tags should be stripped out automatically for post titles, or we at least need a code view to easily clean them up. I run into this issue regularly in the block editor.

ironprogrammer commented 1 year ago

This issue relates to the title field being displayed with styling from the original copy source text. The fix to this issue hides the styling that may be present in the post title field while in the editor. But it doesn't prevent post titles from having markup.

For historical reasons, titles can contain markup, as odd as this may seem -- but this is a feature, and not a bug.

Please note that there is an enhancement underway that would hide this added markup from the posts list table, which may improve things where the markup can be visually distracting: https://core.trac.wordpress.org/ticket/57265.

coreyworrell commented 1 year ago

@ironprogrammer hiding the markup just seems crazy to me. I understand having markup in the title should be allowed, but no markup should come across when pasting. If you type it out, sure, allow it, show it everywhere. Certainly hiding it from the list table and editor would cause more problems because then you would not know your title has markup until down the road seeing it elsewhere (RSS feed, etc).

bozzmedia commented 1 year ago

@ironprogrammer good to know this is a feature and not a bug, thanks for clarifying.

Hiding the markup from the post title is an interesting approach but since the title is still output with the markup it just obscures the issue further. If the Post Title needs to support markup there should be a way to toggle the styling on and off so you can actually edit the markup.

bozzmedia commented 1 year ago

Today https://core.trac.wordpress.org/ticket/57265 was marked as wontfix due to concerns.

Please consider re-opening this ticket as this issue (markup pasted or written into post titles is not editable in the block editor) persists. Thank you.

Himshekhar07 commented 1 year ago

When I copy and paste bold words from a google document file or any other site, it shows <strong> tag in the admin backend for the Twenty-Three, Twenty Twenty-One, Twenty Twenty-Two themes.

Today I created a core ticket for this issue as well : https://core.trac.wordpress.org/ticket/57682#ticket

For better understanding I am posting a video: https://share.cleanshot.com/GTcXSfSJyBdM6rwDRTY4

ironprogrammer commented 1 year ago

Hi, @Himshekhar07 -- as noted in https://github.com/WordPress/gutenberg/issues/38637#issuecomment-1402348424, this behavior is intentional.

There is a separate issue you might check on, #46823, that requests markup in the title field be made visible/editable. This might help identify unintended titles before they are saved.

bozzmedia commented 1 year ago

Related: https://github.com/WordPress/gutenberg/issues/38668