Conal-Tuohy / swinburne

Algernon Charles Swinburne website
4 stars 0 forks source link

Unusually large snippets #16

Closed jawalsh closed 3 years ago

jawalsh commented 3 years ago

Highlighted full-text results frequently include one or more unusually large highlighted snippets. Often this large snippet is the first snippet. Some examples:

Conal-Tuohy commented 3 years ago

I added a utility stylesheet containing a function to abbreviate those unusually large snippets, and I imported the stylesheet and called the function from the stylesheet which renders search results, and also from the stylesheet that marks up hit highlights in the texts themselves.

The leading and trailing parts of each snippet are cut down a maximum number of words (currently 20, but modifiable: https://github.com/Conal-Tuohy/swinburne/commit/96e4fd1c220e6f4d19257f7cd8439a1f66a0b2fa#diff-9e82627805270822add84f08f673abf00672d779f83a9044452a14f37ec3d261R22-R23)

It's a definite improvement over Solr's native segmentation, though it could probably be improved by taking punctuation markers as hints for appropriate snippet boundaries.