Interleaving selections within the DOM

BigBlueHat commented 5 years ago

Finding selections within the DOM and even wrapping them in an element is easy enough, and most developers just "roll their own" highlighter/selector for things like that--hence, they don't "shop" for tools like Apache Annotator for that.

However, juggling interleaved selections in the DOM is tricky and not standardized.

The DOM is a tree. Selections point at regions all over that tree, often intermixed.

We should build tooling to handle that interleaving to manage the display, removal, eventing, etc, for such selections.

See also #45 and #22.

Example:

<div>
<mark id="a1">Call me <mark id="a2">Ishmael</mark></mark>. Some years ago—never mind how long precisely—<mark id="a3">having little or <mark id="a4">no money in my purse</mark></mark><mark id="a4">, and nothing particular to interest me on shore</mark>, I thought I would sail about a little and see the watery part of the world.
</div>

a2 is within a1 and so will have eventing and display related trickiness
a4 is made up of 2 marks, but is currently invalid as they share an id--which conceptually "relates" them as a unit, but the DOM doesn't work that way.
- both sets of  elements would need shared events, display, removal, etc.
a3 also includes 1 part of a4, but not all of it, so weird eventing and display issues again

Solving this (or even just exploring it) is something developers know they need, so likely it should be near the top of our list to solve. 😄

ajs6f commented 5 years ago

@BigBlueHat You're thinking here about sections that can be described by element boundaries, but not arbitrary indexing (character-by-character) into texts, right?

tilgovi commented 5 years ago

@ajs6f arbitrary indexing.

BigBlueHat commented 5 years ago

@ajs6f mainly selections that traverse multiple element boundaries--i.e. selecting part of A and part of B. Trees don't do that so good.

ajs6f commented 5 years ago

This seems pretty challenging. I did some work in a similar area years ago but it was simple and text-only, and it wasn't completely trivial. Is this really meant for text (or text-y) documents, or anything that could be addressed by the DOM (SVG, other things like that)?

BigBlueHat commented 4 years ago

Originally posted this as a separate issue in #78 (but closing and moving here to keep @Treora sane 😉):

https://www.w3.org/TR/intersection-observer/

Might help with highlighter and other anchoring implementations in the DOM.

Treora commented 4 years ago

Since recently we have a simple highlighter, which wraps text nodes in  elements, and ignores any existing s to allow for nested use. I just created some simple tests (see PR #84), including ones inspired by the examples above, that deal with situations like this: 'lorem ipsum dolor <mark2>am</mark2><mark2>et yada</mark2> yada'.

Note there is a difficulty with how a Range behaves when the DOM is modified: running highlight with our current highlighter can mess up other Range objects that point at the same text nodes. So this can cause trouble:

range1 = anchor(annotation1);
range2 = anchor(annotation2);
highlight(range1); // this may mess up range2
highlight(range2); // highlights some unintended target.

A solution is to anchor&highlight as a single action: considering the Range a ephemeral pointer:

range1 = anchor(annotation1);
highlight(range1);
range2 = anchor(annotation2);
highlight(range2);

Our generator-based approach to anchoring should help to do this right, but still it is a pitfall that I’m not very happy with. Some ideas for avoiding this problem:

Using a highlighter that does not modify the document content; some highlighter approaches add an (svg) element to the end of the <body> and display it on the right spot using absolute positioning. While solving this issue, it does create others (e.g. need to reposition the element when text reflows).
Stop using Range for our ‘hydrated’ selector, i.e. as our way to point at a part of the DOM. We could e.g. try implement a ‘RobustRange’ that updates its start-&endContainer&-Offset as needed in one way or another.

For the time being, or if we decide not to fix this, we should probably warn users in any documentation and examples that Ranges are perishable.

Note that the behaviour of Range actually differs between the current jsdom and web browsers, so it is important to run relevant tests in a browser (use our yarn start command) to ensure the tests pass there too.

On the bright side, I added tests to check if one can remove highlights in arbitrary order, and that seems to work as intended. This should give the freedom to ignore the tree structure and treat highlights as being independent from each other. Please suggest/write other scenarios that we should include in our tests.

apache / incubator-annotator

Interleaving selections within the DOM #47