WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.41k stars 4.16k forks source link

pasteHandler() incorrectly treats some strings as links #46287

Open kadamwhite opened 1 year ago

kadamwhite commented 1 year ago

Description

When selecting some content and trying to paste other content into its place, certain strings are interpreted as links which should not be. This causes the selected text to be converted to an invalid link, instead of replacing the selected text with the clipboard contents.

This occurs because the check for whether a text isURL is to run it through the browser's URL constructor, and any non-error result is treated as proof the string is a URL. Passing a string formatted Word: Other Words into this constructor results in a valid URL object, with not-really-URL contents. For example the string Movie: The Revenge of Sequels II in the example below gets parsed as such:

URL {
    href: "movie: The Revenge of Sequels II"
    origin: "null"
    pathname: " The Revenge of Sequels II"
    protocol: "movie:"
}

movie: is interpreted as a protocol, and The Revenge of Sequels II is treated as the pathname. (Note the leading space in that string.) This is valid on the part of URL, but the likelihood of a user consciously pasting a valid URL with a leading space in the path name seems excruciatingly low.

I propose that we should also check the first character of the pathname on the generated URL object, and assume the content is not a URL if that character is whitespace:

diff --git a/packages/url/src/is-url.js b/packages/url/src/is-url.js
index 1fda847947..34fcc30a2d 100644
--- a/packages/url/src/is-url.js
+++ b/packages/url/src/is-url.js
@@ -17,8 +17,8 @@ export function isURL( url ) {
    // A URL can be considered value if the `URL` constructor is able to parse
    // it. The constructor throws an error for an invalid URL.
    try {
-       new URL( url );
-       return true;
+       const parsed = new URL( url );
+       return parsed.pathname[ 0 ] !== ' ';
    } catch {
        return false;
    }

It's possible I am overlooking some category of URLs where a leading space in the pathname would still be valid, but this heuristic works with every link type I can think of.

Step-by-step reproduction instructions

  1. Insert a text block (paragraph, heading, etcetera)
  2. Fill that block with the text "Replace me"
  3. Copy the string Movie: The Revenge of Sequels II to the clipboard
  4. Select the text "Replace me" in the editor
  5. Paste the copied text (Movie: The Revenge of Sequels II) over the text "Replace me"
  6. Observe that, instead of being replaced, "Replace me" is converted into a hyperlink with the destination Movie: The Revenge of Sequels II

Screenshots, screen recording, code snippet

issue reproduction

Environment info

Please confirm that you have searched existing issues in the repo.

Yes

Please confirm that you have tested with all plugins deactivated except Gutenberg.

Yes

Thelmachido commented 1 year ago

I can replicate this issue

WordPress 6.1.1 Gutenberg 14.6.1 TT3 Theme

t-hamano commented 1 year ago

I think protocols are not limited to http or https. Therefore, I think it is correct behavior that isURL() determines movie: as a URL with a custom protocol.

Also, the code you suggested determines that URLs with protocols such as mailto: and tel: are not URLs. My sense is that many users expect these URLs to be pasted as links.

runkit

However, since I think few users will paste URLs with irregular custom protocols, it might be a good idea to determine major protocols as correct URLs.

t-hamano commented 1 year ago

I noticed that the same issue was discussed in #24895. And PR #28534 is the solution to this problem.

Also, as noted in this comment, an approach using isValidHref function seems to have been attempted.

kadamwhite commented 1 year ago

@t-hamano Thank you for finding the older issue PR, I had not been able to locate that in my own searching!

I do believe that it's unusual for an href to have a space in it. Apple's documentation around tel: links, for example, gives all examples in the format tel:1-.... where there is no space after the protocol. The linked PR by @mrclay checks /^\s/.test( parsed.pathname ) and assumes any pathname starting with whitespace is not valid, which matches my own assumptions.

This issue definitely describes the same problem as #24895.

Mamaduka commented 1 month ago

I believe this was fixed via #53000. The link pasting feature only allows HTTP (s) protocols.

t-hamano commented 1 month ago

The focus of whether this issue can be closed might be whether protocols other than http(s) should also be considered URLs.

According to this comment, protocols such as tel: and mailto: seem to be considered rare. What do you think?

Mamaduka commented 1 month ago

We could update Regex and add those two, but the sites usually use contact forms to avoid spam, and I've not seen the tel: protocol in a while.

@kadamwhite, what do you think?