apache / incubator-annotator

Apache Annotator provides annotation enabling code for browsers, servers, and humans.
https://annotator.apache.org/
Apache License 2.0
218 stars 44 forks source link

Accumulation of mark elements because text nodes not cleaned up #120

Open reckart opened 2 years ago

reckart commented 2 years ago

It appears that when adding/removing a highlight, empty text nodes may be added to the DOM. This again may cause subsequent highlighting operations to add additional highlight elements around the empty text nodes. The more highlight there are and the more often they added/removed/re-rendered, the more highlight elements get added to the DOM every time.

Calling normalize() on the root element to which the annotators is applied after removing highlights seems to fix this problem. It would be nice maybe though if Annotator was smart enough to clean up empty text nodes itself without having user code to worry about it.

Treora commented 2 years ago

In which situation are empty text nodes added? That sounds like a potential bug. Only the start and/or end node should be splitted, and only if it is partially selected by the range. See these lines:

  // If the start or end node is a text node and only partly in the range, split it.
  if (isTextNode(range.startContainer) && range.startOffset > 0) {
    const endOffset = range.endOffset; // (this may get lost when the splitting the node)
    const createdNode = range.startContainer.splitText(range.startOffset);
    if (range.endContainer === range.startContainer) {
      // If the end was in the same container, it will now be in the newly created node.
      range.setEnd(createdNode, endOffset - range.startOffset);
    }
    range.setStart(createdNode, 0);
  }
  if (
    isTextNode(range.endContainer) &&
    range.endOffset < range.endContainer.length
  ) {
    range.endContainer.splitText(range.endOffset);
  }

As discussed in issue #80, the highlighter does not run .normalize(), simply to not touch dom nodes when it’s not necessary. However, as written there, I’d be happy to add a few lines that undo the splitting of the start and end nodes, if any splitting happened, so that removing the highlight becomes a “perfect undo”.

But again, empty text nodes should not appear due to our highlighter, so I’d be glad if you have an example case to help debugging this!

reckart commented 2 years ago

It seems to happen in particular with overlapping/nesting highlights like e.g. <mark><mark>foo</mark> bar</mark> - the more overlap/nesting, the more it seems to happen.

reckart commented 2 years ago

Also consider cases like <mark data-id="1">foo<mark data-id="2"><mark data-id="1">bar</mark></mark><mark data-id="2">blah</mark>