WICG / scroll-to-text-fragment

Proposal to allow specifying a text snippet in a URL fragment
Other
589 stars 42 forks source link

What characters explicitly are percent-encoded? #194

Closed zamicol closed 2 years ago

zamicol commented 2 years ago

I wanted to clarify, what characters are exactly percent encoded for text fragments? The README suggests that only dash (-), ampersand (&), and comma (,) are escaped. Then the link in the README references a different set of characters: SPACE, ("), (<), (>), and (`)

Both sets, even when added, excludes common percent encoding characters from fragment percent encoding.

The standard percent-encoding character set would be a great option here.

 ':', '/', '?', '#', '[', ']', '@', '!', '$', '&', "'", '(', ')', '*', '+', ',', ';', '=', as well as '%' itself. 

https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding

bokand commented 2 years ago

The README was written long ago and doesn't go into a great level of detail. I think this is explicit in the spec: The set of URL code points, less &, -, ,, can be used without percent encoding. All other code points must be percent encoded.

zamicol commented 2 years ago

Thank you, give me a bit and I'll update the README with a pull request.

zamicol commented 2 years ago

I think this is a simple adjustment.

Instead of saying this:

The text keyword will be used to identify a block of text that should be indicated. The provided text is percent-decoded before matching. Dash (-), ampersand (&), and comma (,) characters in text snippets must be percent-encoded to avoid being interpreted as part of the text fragment syntax.

It could read:

The text keyword will be used to identify a block of text that should be indicated. The provided text must be percent encoded using the standard [percent encode set](https://url.spec.whatwg.org/#percent-encoded-bytes) and is then percent-decoded before matching. The percent encode set is inclusive of dash (-), ampersand (&), and comma (,) as to avoid being interpreted as part of the text fragment syntax.

Also, a note of this rule should be included regarding fragment directive itself, :

Users can specify multiple snippets by providing additional text directives in the fragment directive, separated by the ampersand (&) character. **Fragment directives must be percent encoded. **


The statement that was throwing me off is informational and is simply stating that a fragment must be minimally fragment percent encode, which is obvious/expected. I would perhaps update that as well to make it more obvious that the paragraph is informational and then compare the two:

The [URL standard](https://url.spec.whatwg.org/) specifies that a fragment can contain [URL code points](https://url.spec.whatwg.org/#url-code-points), as well as [UTF-8 percent encoded characters](https://url.spec.whatwg.org/#utf-8-percent-encode). Characters in the [fragment percent encode set](https://url.spec.whatwg.org/#fragment-percent-encode-set) must be percent encoded. **Fragment directive requires the more restrictive [percent encode set](https://url.spec.whatwg.org/#percent-encoded-bytes). **

zamicol commented 2 years ago

"Fixed" by #197 .