bryanmcquade / scroll-to-css-selector

Explainer for supporting CSS selectors when navigating to a URL fragment
61 stars 1 forks source link

Consider reusing (and extending) W3C Selectors and States #2

Open csarven opened 6 years ago

csarven commented 6 years ago

The proposal in this repo is close enough to an existing W3C Note, so I'd like to suggest to simply reconsider reusing that and extending whatever is necessary:

W3C Selectors and States

and in particular "CSS Selector":

https://www.w3.org/TR/selectors-states/#CssSelector_def

See also "Example for a 3.2 CSS Selector":

https://www.w3.org/TR/selectors-states/#CssSelector_frag

It is not restricted to CSS based fragments, so the general mechanism is extensible and plays well with the rest of the Web Annotation stack eg. an arbitrary text selection in a document.

robertknight commented 6 years ago

Firstly, thank-you for your work on this proposal @bryanmcquade. I think it would be a very useful addition to the web platform!

I work on the Hypothesis project which, as one of its features, provides a way to create highlights in a web document and share links which will take the user directly to that highlighted area. We have some experience dealing with the issues around finding robust ways to "anchor" a highlight so that it will often still work when revisiting a page in future after changes, or fail gracefully as a last resort.

The current "anchoring" implementation is based on the W3C selectors and states specs but TBH, the only selector that really matters is the text quote selector which captures the exact text of the user's selection plus the context before and afterwards. The quote has the advantage that it can always be found providing the content that the user was originally looking at is still logically there on the page, even if the website has undergone a complete redesign. Our quote matching is "fuzzy" so it can cope with changes to spacing, capitalization or corrections for eg. typos.

It might be the case that matching text is out of scope for this proposal, but basing the proposal around the W3C selectors and states work may provide a way to add it as an extension in future.

csarven commented 6 years ago

Any document that changes is subject to cause issues for a selector - regardless of which type of selector is used. Hence, any selector will do just fine against persistent content/structure. Needless to say, which one to use and in combination, depends on context and what one wants to achieve. There is no one-size-fits-all here. So, CSS Selector is perfectly fine depending on the effect you are after. Hence, this issue.

Having said that, "text quote selector" and "text position selector" can be sufficiently unique with respect to the content. For example, dokieli (in the example above) uses "text quote selector"; 32 characters before and after the selected text. That is, if any part of the document changes, whether content or structural that's beyond potentially 64 characters + the length of the selection can be sufficiently uniquely identified - bar multiple matches. Just to contrast with the "text position selector", if a character were to be added or removed before the preceding 32 characters, "text position selector" would not work.

Anything beyond the W3C Note that I've mentioned is implementation centric. Any tool that writes reads/writes its data based on some arbitrary "out of band" knowledge is subject to be a vendor lock-in.

BigBlueHat commented 6 years ago

Glad to see someone else exploring pointing into documents! It'd be great to have you join the Open Annotation Community Group out of which was born the Web Annotation and ultimately the W3C Note mentioned here.

FWIW, the Selectors and States W3C Note @csarven mentioned has an ongoing implementation in the works as part of Apache Annotator.

Like the cssFragID, the Selectors and State note owes it's heritage to XPointer. Be careful not to let "lack of browser support" smokescreen XPointer's actual use or potential value in informing further work. XPointer itself, for instance is scoped to certain XML media types, so it still has value where those media types are supported and/or processed.

Ultimately, any fragment identifier change will require a change the media type definition (in this case text/html et al), so be ready to deal with that "overhead." It's a very tall hill to climb, but welcome to the fray regardless! 😄