WICG / scroll-to-text-fragment

Proposal to allow specifying a text snippet in a URL fragment
Other
589 stars 42 forks source link

Expected lifetime of links #174

Closed jakearchibald closed 10 months ago

jakearchibald commented 3 years ago

My assumption was that scroll-to-* links are expected to have a short lifetime, and are expected to be somewhat unreliable.

Text links won't work if text is altered as part of a minor correct, or if the text is different for a given user (eg a different language).

CSS selector links won't work if the selectors change in a subsequent deploy (particularly likely if the developer uses a tool that minifies selectors), or the user is given a different DOM structure (eg a mobile site vs desktop site).

I figured this was fine, as the failures are somewhat graceful.

However, on a recent CSSWG call @eugenegirard suggested this isn't the position of the group, and it's expected that these links will work for some extended period.

tomayac commented 3 years ago

The spec has the following to say:

Prefer Exact Matching To Range-based

The match text can be provided either as an exact string "text=foo%20bar%20baz" or as a range "text=foo,bar".

UAs should prefer to specify the entire string where practical. This ensures that if the destination page is removed or changed, the intended destination can still be derived from the URL itself.

Suppose we wish to craft a URL to https://en.wikipedia.org/wiki/History_of_computing quoting the sentence: The first recorded idea of using digital electronics for computing was the 1931 paper "The Use of Thyratrons for High Speed Automatic Counting of Physical Phenomena" by C. E. Wynn-Williams. We could create a range-based match like so:

https://en.wikipedia.org/wiki/History_of_computing#:~:text=The%20first%20recorded,Williams

Or we could encode the entire sentence using an exact match term: https://en.wikipedia.org/wiki/History_of_computing#:~:text=The%20first%20recorded%20idea%20of%20using%20digital%20electronics%20for%20computing%20was%20the%201931%20paper%20%22The%20Use%20of%20Thyratrons%20for%20High%20Speed%20Automatic%20Counting%20of%20Physical%20Phenomena%22%20by%20C.%20E.%20Wynn-Williams

The range-based match is less stable, meaning that if the page is changed to include another instance of "The first recorded" somewhere earlier in the page, the link will now target an unintended text snippet.

The range-based match is also less useful semantically. If the page is changed to remove the sentence, the user won’t know what the intended target was. In the exact match case, the user can read, or the UA can surface, the text that was being searched for but not found.

Range-based matches can be helpful when the quoted text is excessively long and encoding the entire string would produce an unwieldy URL.

It is recommended that text snippets shorter than 300 characters always be encoded using an exact match. Above this limit, the UA should encode the string as a range-based match.

jakearchibald commented 3 years ago

Seems like there's no equivalent to 'exact' when it comes to matching a CSS selector, it's always like 'range'.

bokand commented 3 years ago

My personal view is that we should do what we can to make text-fragments as stable as possible but we have to be realistic about the limits, yes, in general they will go stale as page content is updated.

That said, as long as the quoted text continues to appear on the page we should be able to have a long-lived link. As Thomas points out, a large part of this is on client software generating robust links. A few pointers to keep in mind:

We still haven't settled on a CSS selector format but I suspect it'll be less reliable (for the reasons you mention). This was part of the motivation of using text rather than CSS in the first place as we expect that to be more stable.

tomayac commented 3 years ago

CSS Selector links for now are targeting primarily image and video link use cases, but text linking would be possible. An exact link would be to use the id attribute (assuming unique ids, which in practice is not always given), and a fuzzy link would be, for example, to use something like the alt attribute of an image, which could be duplicate on the page. (See the explainer for the current selector restrictions.) Side remark: we did a lot of real-world tests with responsive sites and found that in general src works reliable enough for images.

jakearchibald commented 3 years ago

we did a lot of real-world tests with responsive sites and found that in general src works reliable enough for images.

That's good, but it's still best-effort. Developers shouldn't worry about, say, improving the compression of an image (which would change the URL) because it might break links like this.

jakearchibald commented 3 years ago

To be clear, I'm not saying "therefore this proposal is bad", I like this proposal! I think it's fine for these links to be less reliable than regular URLs, or ID fragments. They fail gracefully.

bokand commented 10 months ago

Doing some issue hygiene, I'll leave this one open for now as I think it'd make sense to document the above in some non-normative text in the spec.