KorAP / Koral

:pencil: Translation of query languages to serialized KoralQuery protocol
BSD 2-Clause "Simplified" License
10 stars 4 forks source link

Fix Regular expression parsing in C2 #66

Closed Akron closed 1 year ago

Akron commented 5 years ago

The C2 regex parser misinterpreds some symbols inside the regular expression operation. I have the strong feeling, that #REG(...) is more or less ignored as an operator.

For example #REG(Ba.m) searches for Ba.m verbatim.

In #REG( Redakteur(s|e|en|in|innen)? ), the second opening parenthesis isn't interpreted as part of the regex, but starts a new query term, interpreted as being in a sequence (i.e. "Redakteur" followed by /s|e|en|in|innen/ followed by /./; see https://github.com/KorAP/Koral/issues/63 for the reason of misinterpreted placeholder symbols in regular expressions).

See http://www.ids-mannheim.de/cosmas2/web-app/hilfe/suchanfrage/eingabe-zeile/syntax/thema-druck.html?template=/cosmas2/template/print.tpl#Bsp for a list of examples to be part of the test suite.

Akron commented 1 year ago

Also in a string with character classes only, #REG() is ignored.

Bodmo commented 1 year ago

That's right, #REG is not implemented yet. I'll do that.

Akron commented 1 year ago

Fixed in https://github.com/KorAP/Koral/commit/6145d7a2f142df8e451840495bdf2a03a820cc2d