KorAP / Kustvakt

:speedboat: User and policy management component for KorAP, capable of rewriting queries for policy based document restrictions.
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

COSMAS II - Queries with #REG-Feature > lead to an empty result set #636

Closed notesjor closed 1 year ago

notesjor commented 1 year ago

Kustvakt version current version (2023-07-18)

Describe the bug Apparently there is a bug that does not translate queries with the "COSMAS II" syntax correctly into the KorAP system. The problem occurs especially in connection with queries that use the RegEx feature of COSMAS. Ex: #REG(\innen$) for a query on opposite-gendered forms.

To Reproduce Send this search request with a vail Bearer-Token against KorAP-API https://korap.ids-mannheim.de/api/v1.0/search?context=sentence&cutoff=false&ql=cosmas2&q=%23REG%28%2A%5C%2Ainnen%24%29&page=1

Expected behavior Same results like COSMAS II

Desktop (please complete the following information):

Smartphone (please complete the following information):

Akron commented 1 year ago

This is already reported here. Closed as duplicate.

Akron commented 1 year ago

Maybe I don't understand the regex. Starting with a quantifier in my opinion should definitely fail. What I can see is using ".*\*innen$", which works, but unfortunately can be a known "Killer-Query" in Lucene. Because of the Tokenization in DeReKo this query, however, won't match any tokens.