KorAP / Kustvakt

:speedboat: User and policy management component for KorAP, capable of rewriting queries for policy based document restrictions.
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

Service: Search GET - sentence mark not included #574

Open notesjor opened 1 year ago

notesjor commented 1 year ago

If you use the search and set the context parameter to sentence. You get the sentence, but not the punctuation mark as a token. However, the punctuation mark is contained in the snippet. Please also add the punctuation mark to the token output.

Akron commented 1 year ago

Punctuation marks are not treated as tokens in KorAP to be in line with word distances in Cosmas-II. So - this is a wontfix. But we may be able to support a simple "preceding"-data token structure, that returns all tokens of a match including preceding data. This would possibly add an empty token at the end to account for "following"-data as in your example.