Enable toggle of search term highlighting in opinion text

brianwc commented 9 years ago

A user writes:

When you type in a subject search, the words or phrase highlights in yellow on the intial screen listing of the numerous cases which popped up in the search. Can't you also have those same words or phrasing highlight in yellow when one clicks on an individual case? Then I don't have to read the entire case just to find the subject/words I was searching for.

Should also be toggle-capable for those that dislike.

mlissner commented 9 years ago

What's the reason for a toggle? I really want to avoid toggles as much as possible.

I'm generally in favor of this, but it does mean that we have to rework the opinion page so that it gets its data from Solr not from the DB.

brianwc commented 9 years ago

The reporting user apparently likes highlighting in the opinions. I don't mind it for two seconds while I find the relevant passages, but then I always turn it off as it annoys me to read text with highlighting. Those that print opinions probably also would want to turn off highlighting. There's another service that does this on their opinions. Go try it. I always use their toggle to turn it off.

On September 8, 2015 11:21:26 AM PDT, Mike Lissner notifications@github.com wrote:

What's the reason for a toggle? I really want to avoid toggles as much as possible.

I'm generally in favor of this, but it does mean that we have to rework the opinion page so that it gets its data from Solr not from the DB.

Reply to this email directly or view it on GitHub: https://github.com/freelawproject/courtlistener/issues/365#issuecomment-138656451

Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mlissner commented 9 years ago

Makes sense. I wonder if we'll need the toggle to be sticky though, so it's on the screen all the time. If we don't, we have the problem that once you've found the relevant passage, you have to go back to the top, flip the toggle, then find the passage again.

mlissner commented 1 year ago

I'm labeling this as "search", but I'm not sure if it is really part of the search system, or something that needs to be done in JavaScript.

albertisfu commented 1 year ago

Looking into this issue more closely, I think we've got a few different ways we could get this feature done.

Pass the highlighted terms from the search results page to the detail page.

With this method, we'd grab the highlighted terms for each item in the search results. Then, when a user clicks on a result, we'd send these terms over as GET parameters. On the detail page, we could pull these terms from the URL parameters. Then we could use some JavaScript to find these terms in HTML elements like the case_name, judge, docket_number, and so on, depending on the object type. Or, we could add the highlighting on the backend side and just display the HTML highlighted, which might be more efficient and accurate.

However, I think there might be times when the highlighting on the detail page won't be as accurate as the ES highlighting on the results page. For instance, I can think of an example like this:

Query: 19-5735 Highlighted terms on the result page are: 19-5735

After getting highlighted terms we'll have ["19", "5735"]

Now we can suppose that the body content has the following text: Docket 19-5735 was published on May 19.

Following the logic of highlighting the terms we get from the results page, we'll have: Docket 19-5735 was published on May 19.

This could end up looking different from the results page. That's because the results page only highlights the docket number, as the search was actually about a docket number, not a day.

Get highlighted content directly from Elasticsearch

An alternative is to retrieve the content of the detailed object from Elasticsearch, instead of from the database. This would involve executing an additional query in Elasticsearch, which would include the object_id and the original search terms from the query. By doing so, we could obtain the highlighted fields directly from Elasticsearch, eliminating the need for any content replacement.

For models such as Oral Arguments, we wouldn't need to add additional fields, since the fields displayed on the detail page have already been indexed.

Implementing this approach with models like Opinions, where highlighted content might be within html_columbia, html_lawbox, html, or plain_text, would require indexing each of these fields independently. This is necessary to display the correct content on the detail page, following the current logic to show the right content according to its priority. Moreover, we would need to add all these fields within text (as is currently done) so highlights continue to be shown on the search result page.

Also, this approach seems complex for RECAP, where query results are grouped. We could return the highlighted case_name directly from Elasticsearch, but it would be complex to return highlighted docket entries descriptions since on a docket page, we need to display all docket entries, not only the ones that matched a search. Hence, in this case, it would be more practical to use the first approach highlighting on the detail page based on the highlighted terms from the results page.

@mlissner I'd like to hear your opinion about the options we have to add this feature.

About the toggle, I like the approach of having the sticky toggle to activate or deactivate highlighting.

mlissner commented 1 year ago

Yeah, sounds like using ES for this is the wrong approach. I also think there's not much (any?) value in highlighting things like the case name. So maybe the trick is to just grab anything in q= from the GET parameters, and highlight that, even if it's a bit different from what ES highlighted, due to parsing. I wonder if there's some sort of JS library that does ES parsing in some clever way. That seems like something that might exist, though it might be overkill.

albertisfu commented 1 year ago

Got it, I agree with using the search terms "q=" and highlighting content based on these terms.

Yeah, it seems that there are some JavaScript libraries that can accomplish this in a smart way. For example markjs.io appears to be a good option and is highly configurable. The minified version of the library has a size of around 17KB.

Let me know if this alternative seems good to you, and I can start working on it.

mlissner commented 1 year ago

That sounds good, and I appreciate the research. Let's plan on doing this later, though, so long as it's not a blocker for the rest of the search project (which it seems not to be).

freelawproject / courtlistener

Enable toggle of search term highlighting in opinion text #365

Pass the highlighted terms from the search results page to the detail page.

Get highlighted content directly from Elasticsearch