KorAP / Krill

:mag: A Corpus Data Retrieval Index using Lucene for Look-Ups
BSD 2-Clause "Simplified" License
15 stars 3 forks source link

getMatch expansion #143

Closed margaretha closed 1 month ago

margaretha commented 2 months ago

When calling getMatch with sentence expansion=true, the annotation of the match is cut according to the krill.match.max.tokens but the primary data (surface text) not.

Primary data should respect the krill.match.max.tokens and krill.context.max.tokens.

Some suggestions for span expansion approaches:

  1. expansion should recognize the match-max-size + context-max-size.
  2. sentenceExpansion should be configurable
Akron commented 2 months ago

These are two issues. One is: a) Primary data should respect the max values, even for span expansion and b) span expansion for maxes should probably take context size into account for max expansion.

margaretha commented 2 months ago

I have created another issue for handling span expansion using context size. See https://github.com/KorAP/Krill/issues/144