apache / incubator-annotator

Apache Annotator provides annotation enabling code for browsers, servers, and humans.
https://annotator.apache.org/
Apache License 2.0
218 stars 44 forks source link

Turn highlightText into a class TextHighlight #135

Open Treora opened 1 year ago

Treora commented 1 year ago

Currently, highlightText returns a function that removes the highlight again. In some cases, one may want more possibilities, e.g. attaching a hover event listener to the highlight elements, measuring its location on the page, etc.

This change introduces a TextHighlight class. Instead of highlightText(…) one would write new TextHighlight(…), and the created object has a remove() method. That object’s highlightElements property gives access to the array of <mark> (or whichever) elements that used to be an internal, inaccessible variable.

The previous API is kept available for now using a simple wrapper; we could remove it in a new release.

AdamMYoung commented 4 months ago

Any chance we'll be seeing this merged any time soon? It would be super useful to my use-case, as currently I'm having to remove and re-highlight all my selectors due to having to change the highlight colour based on annotation order. This would greatly optimize how I'm using the lib

Treora commented 4 months ago

Hi Adam, great to hear you have an interest in this. It’s been a while but I think this was ready to merge; the main issue is to have maintainer attention, testing and publishing a new release, etc. (As you may have noticed this library has been on a back-burner pace, so also if you feel like contributing yourself, extra hands are most welcome.)

Hearing that people use this software is a good motivator to put a bit more effort into it. In case you’d like to share, I am curious whether you use the highlighter or more of annotator, in what way, and what would be required to make it better for your use case.

AdamMYoung commented 4 months ago

Hey @Treora, not to worry. I've adopted this library at work for building out an annotator for HTML documents we receive, and it's been fantastic so far. As a work around I ended up abstracting this away in a similar way as your PR, and querying the mark tags with document.getElementsByClassName and a unique class identifier, so no worries there.

The only real issues I've ran into is around tables really which I believe was mentioned in an issue somewhere, but have worked around it by iterating down through table cells and highlighting text nodes manually (which isn't very performant, but works well enough for our use case).

Other than that, it's all been smooth sailing. Really appreciate all the work that's been put into it so far, especially having something based on W3C standards for future-proofing what we're building!

tilgovi commented 4 months ago

I had eventually wanted to replace this highlighter with the Highlight API. I imagined that if we did this we might totally hide the fact that there are elements at all in the fallback implementation.

Can either of you say more about what you need to access the mark elements for? Some things could be handled by the Highlight API (like getting the boundary rectangles via the ranges) but I want to understand what you need.

reckart commented 4 months ago

Other use-case, but I certainly "access" the highlight elements from CSS rules in order to style them and occasionally also query them from the DOM and post-process them. The Highlight API would not be viable for me I believe - to inflexible.

Here an example:

Screenshot 2024-04-28 at 09 05 03
AdamMYoung commented 4 months ago

In my use-case, the styles are pre-determined so the Highlight API is a viable option. I think the only downside like @reckart mentioned is the lack of run-time styling. Going to take a look into the Highlighter API though, thanks for the heads up

As an aside, the Highlight API would also fix the issue raised by @Treora for table highlighting in https://github.com/apache/incubator-annotator/issues/131