KorAP / Krill

:mag: A Corpus Data Retrieval Index using Lucene for Look-Ups
BSD 2-Clause "Simplified" License
16 stars 3 forks source link

Add maxSnippetSize parameter #128

Closed margaretha closed 3 months ago

margaretha commented 6 months ago

Please add a new parameter to allow Kustvakt to change the snippet size beyond limit. It is necessary to support larger match context for a group of users see (https://github.com/KorAP/Kustvakt/issues/745)

The parameter should be exclusive for Kustvakt and not adjustable by users.

Akron commented 6 months ago

I checked and there are actually two limits: one is character based, one is token based. The match has a token based limit, which makes sense for annotation data retrieval. so we may want to have maxMatchTokenSize. And we have context limits, which are character based, which may make sense as well ... So - for this possibly maxContextCharSize?

margaretha commented 6 months ago

Thanks for checking, Nils! Does maxMatchTokenSize not include the contexts? Just the matches itself?

whereas maxContextCharSize include match and context alltogether?

Akron commented 6 months ago

No - maxMatchTokenSize doesn't include the context and maxContextCharSize is a maximum value for left and right context. There is no maximum snippet size, as we allow to cut matches and still allow to view the context. I can't remember the concrete reason, it may have been just simpler to implement. But it also has some advantages.

Krawfish will have some difficulties with character sizes in contexts, but I think we can keep this feature.

So - do you want to have the context adjustable only?

margaretha commented 6 months ago

Thanks for the clarification! We have discussed this over Slack and agree to allow adjustment for both variables.