WICG / scroll-to-text-fragment

Proposal to allow specifying a text snippet in a URL fragment
Other
586 stars 43 forks source link

Opt-in to hide `text` fragment directive from scripts #234

Open zcorpan opened 11 months ago

zcorpan commented 11 months ago

See https://github.com/eligrey/fragment-directives/issues/1

Currently, the fragment directive is hidden from most APIs but in Chromium is (unintentionally) exposed in performance.getEntries().

Most search engines don't expose the search query to the linked page, but if they use text fragments to highlight matching text, and if the text fragment directive is directly available via an API, the page has a rough idea what the search query was.

I assume that search engines generally have an interest in using text fragments (for better user experience) and also want to hide the search query from the pages it links to in its search results (for privacy).

But there's evidently also interest in accessing the text directive in other situations, e.g. custom scrolling to text fragment.

I suggest adding a fragment directive to hide the text directive from the page's script:

#:~:text=Something&hide-text-fragment-from-script

(naming TBD)

A search engine (or other) can then feature-detect both text fragment support and support for the hiding opt-in:

if (document.fragmentDirective?.supportsHideTextFragmentFromScript) {
  // ok to use text fragments in links
}
annevk commented 11 months ago

I don't understand what this is proposing. Are you suggesting text fragments would be exposed by default unless folks opt-out?

zcorpan commented 11 months ago

Yes.

annevk commented 11 months ago

That does not seem good from a privacy perspective.

zcorpan commented 11 months ago

Do you think the text fragment should never be exposed to script?

annevk commented 11 months ago

I'm not sure I'd go that far. When the navigation was made by script exposing it to script seems reasonable.

bokand commented 11 months ago

An opt-in seems more appropriate in this case. A referrer that decides there's no privacy risk in sharing the text fragment with the destination can opt-in to allow the destination to see it. By default, it's hidden. Seems similar to referrer-policy to me.

annevk commented 11 months ago

Note that most browsers clamped down on Referer beyond referrer policy. E.g., unsafe-url in practice only works same-site in a number of cases. So we want to tread with care.

simon-friedberger commented 11 months ago

This is a bit apropos but afaict the spec also doesn't include any justification for hiding the fragments from the site. @annevk could you elaborate a bit on what the privacy concern is?

And, as mentioned in the issue linked above, since the site can already detect its scroll position it will probably have a good idea what the search text may have been.

bokand commented 11 months ago

It's true that the initial motivation was to prevent the page from breaking due to seeing unexpected content in the fragment portion of the URL. However, there are privacy considerations as well. They're not discussed (as we assumed they were obviated by the URL stripping mechanism) though I probably should have at least mentioned it somewhere. For example, if a user is sent to a page via a search engine, users don't expect the page to be able to infer their search query.

While the page author could probably guess roughly, even without scroll-to-text, based on their page's overall content, and now via the scroll position, having the exact text directive used does increase the granularity of that signal.

simon-friedberger commented 11 months ago

Thanks David! I certainly agree that the leakage is slightly worse.

On the other hand…

  1. There is some desire to have the data accessible from scripts for custom searches (not sure how that is supposed to interact with the built-in browser search!) or marginalia.
  2. I am not sure the additional complexity of distinguishing hidden and non-hidden directives as discussed in https://github.com/eligrey/fragment-directives is a good trade-off.
  3. It was common practice to include search terms in the referrer in the past. And this is still possible. And the new fragments don't require including search terms.
  4. It's not clear that users would expect a site to not be aware of what is being searched for either.

At this point, I am not convinced that the fragment directives should be hidden from scripts. I am even less convinced that it should be configurable.

To provide a different point of view: URL has always been available to the server, and URL including fragment has always been available to scripts. Do we really want to introduce a new concept by having parts of the URL hidden from server and scripts and only available to the UA? (Not a rhetorical question.)

annevk commented 11 months ago

It was common practice to include search terms in the referrer in the past. And this is still possible.

It's not really, as I pointed out above. Unless you somehow make them part of the target URL, potentially modifying the target response as a result.

Do we really want to introduce a new concept by having parts of the URL hidden from server and scripts and only available to the UA?

I think that is a decision we made when we added text fragments, i.e., yes.

simon-friedberger commented 11 months ago

Thanks for clarifying!

I agree with @bokand that it's worth spelling out in the spec what is hidden and why. It's important for the follow-up discussion on making that hiding optional as proposed in https://github.com/eligrey/fragment-directives.

noamr commented 2 months ago

Coming back to this... given that URLs from search engines can be shared across browsers, I don't think we can assume that the browser where the user consumes the URL is going to hide a text fragment produced by a browser that does support fragment hiding. If it's in the URL, it's going to potentially be accessible at the receiving end, and also create breakage if the website expects a particular fragment format.

Given that due to the navigation timing quirk the fragment directive was never really hidden, I suggest to unhide it completely from script and expose it in document.URL etc, and encourage browsers to perform some soft hiding such as omitting it when sharing links using browser UI etc.

annevk commented 2 months ago

Wouldn't that rather badly regress on the privacy properties of this feature? Presumably Chromium is planning on addressing the bug that inadvertently exposed this information?

noamr commented 2 months ago

Wouldn't that rather badly regress on the privacy properties of this feature? Presumably Chromium is planning on addressing the bug that inadvertently exposed this information?

My point is that the privacy properties of this feature are not a web platform concern, but rather a concern for the linking site (e.g. search engine) or browser UI (e.g. hiding this info when copying/pasting the link). Anything else would give a false sense of privacy as hiding the text fragment doesn't work like a progressive enhancement.