GoogleChromeLabs / text-fragments-polyfill

Apache License 2.0
116 stars 27 forks source link

Removing diacritial on `й` breaks scrolling to a text fragment. #134

Open moetelo opened 2 years ago

moetelo commented 2 years ago

https://github.com/GoogleChromeLabs/text-fragments-polyfill/blob/0e23fabfb4fb1f4fdc1d620d87b0a9ee7357566e/src/text-fragment-utils.js#L918-L919

Case in which removing diacriticals breaks scrolling to the text fragment Cyrillic alphabet, й.

Chrome-produced link: #:~:text=На%20«Галактическом%20основном»-,Йода,-разговаривает%2C%20инвертируя%20порядок)

The polyfill would produce something like #:~:text=На%20«Галактическом%20основном»-,Иода,-разговаривает%2C%20инвертируя%20порядок. Notice Иода instead of Йода.

Now, such link would not work in Chrome and in other browsers with polyfill enabled.

moetelo commented 2 years ago

In some cases it doesn't matter, whether you remove diacriticals or not Spanish alphabet, diacriticals like ñ.

For the word señora, Chrome's Copy link to selected text gives the following result: https://www.collinsdictionary.com/dictionary/spanish-english/se%C3%B1or#:~:text=Word%20forms%3A%20se%C3%B1or%2C-,se%C3%B1ora,-ADJECTIVE in which, as you can notice, the diacriticals are preserved: se%C3%B1ora. This polyfill would (probably, untested) produce the same result, but with ñ replaced to n. https://www.collinsdictionary.com/dictionary/spanish-english/se%C3%B1or#:~:text=Word%20forms%3A%20senor%2C-,senora,-ADJECTIVE , which also works as expected — scrolls browser to the expected place.

Thus, the issue can be resolved by always preserving diacriticals. As far as I understand, this could be achieved by removing this line: https://github.com/GoogleChromeLabs/text-fragments-polyfill/blob/0e23fabfb4fb1f4fdc1d620d87b0a9ee7357566e/src/text-fragment-utils.js#L929

The second way is to only preserve U+0306 for й support.

tfmar commented 2 years ago

Hi Mikhail, thanks for the report.

According to the spec, highlighting should ignore case and diacriticals. (If there's some reason this should be applied differently in Russian, please let me know, but I think we're doing the right thing here.)

I'm hesitant to change our normalization behavior, because when we generate URLs, we want to make sure we aren't relying on diacriticals to uniquely identify one part of a page (versus identical text elsewhere without the diacriticals).

I'm going to reach out to some members of the Chrome for Desktop and Android teams and see if there's a reason they're not highlighting the polyfill-generated URL you provided above.