Syntax for PercentEncodedChar looks wrong

hsivonen commented 1 year ago

https://wicg.github.io/scroll-to-text-fragment/#percentencodedchar says:

PercentEncodedChar ::= "%" [a-zA-Z0-9]+

That is, one or more hex digits.

https://url.spec.whatwg.org/#percent-encoded-bytes says:

A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits.

It seems to me that the production should be:

PercentEncodedChar ::= "%" [a-zA-Z0-9][a-zA-Z0-9]

Aside: The name PercentEncodedChar is misleading, since what's being percent-encoded is a byte rather than a character.

hsivonen commented 1 year ago

It appears that Chrome allows one hex digit (per the spec) in the validation step. ☹️

bokand commented 1 year ago

Thanks for the catch! Will fix the production.

It appears that Chrome allows one hex digit (per the spec) in the validation step. ☹️

Chrome treats the two text= directives as independent of each other so if the first one were to fail as invalid, the second is still applied. This doesn't match the spec text which implies that the entire list should be discarded. Personally, I prefer Chrome's behavior of keeping directives independent so would like to change the spec text but am curious what you think.

Interestingly, Chrome does fail to decode the %F but this causes it to leave a % char in the matching string (i.e. this would match page text "U+FFFD (%F)"). This is a Chrome bug, I've filed https://crbug.com/1482847 (note: Safari behaves the same way)

bokand commented 1 year ago

I've updated the spec to fix the percent-encoding production. I'll add some WPTs for the cases mentioned above.

WICG / scroll-to-text-fragment

Syntax for PercentEncodedChar looks wrong #229