KorAP / Koral

:pencil: Translation of query languages to serialized KoralQuery protocol
BSD 2-Clause "Simplified" License
10 stars 4 forks source link

Rewrite placeholders in Cosmas-II QL #13

Closed Akron closed 5 years ago

Akron commented 9 years ago

The Cosmas-II Query "Schiff+ahrt" is correctly interpreted as a placeholder in a wildcard, as described here. However, the placeholder + is not rewritten to ?, which is a failure.

Akron commented 7 years ago

The correct rewrite would force the wildcard to be a regex, with: ? being . + being .? * being .*

Akron commented 7 years ago

I am refering here to a discussion in https://github.com/KorAP/Krill/issues/33 :

@margaretha wrote:

Ic, so Wildcards is default (priority higher) and regex is supplementary.

I wouldn't say it that way - but Wildcards are simpler to implement than any complex regex syntax like PCRE.

Nevertheless KQ implementation would not be straight forward in the case of mixed Wildcards.

There may be wildcard syntax beyond the KQ specification - these would need to be translated to Regex.

By default (Wildcards), + would be treated as a normal character, but we would like to treat this rather as a regex and rewritten it to m.?n.* so it seems that Wildcards is less useful than regex. Which QL would only need wildcards without regex? Even for C2 weneed both.

That's true. There is no benefit for query language support, I guess. Only for potential implementers. I agree, that we probably should simplify KoralQuery and deprecate support for WildCards in favor of Regex.

Akron commented 7 years ago

I have deprecated type:wildcard in the spec and removed it from the text.

Bodmo commented 7 years ago

The Wildcard Search in COSMAS II depends on Case options like

If we rewrite wildcard queries by regex queries, we have to be sure that we have a regex implementation that controls these cases too.

Akron commented 7 years ago

Thank you @Bodmo . Can you point us to the relevant documentation?

Bodmo commented 7 years ago

see http://www.ids-mannheim.de/cosmas2/web-app/hilfe/suchanfrage/eingabe-zeile/syntax/platzhalter.html under Optionen.

Akron commented 7 years ago

Ah - thanks. Is this also part of the C2 Query language or do we have to add these flags to the query language in Koral? I think, regarding the first letter restriction this should indeed be resolved in the regex building (not only for Wildcard queries, if I understand this restriction correctly). Flags for case insensitivity and diacritic insensitivity are already part of KoralQuery and should be adopted.

margaretha commented 7 years ago

Case insensitivity has been implemented, however Koral does not differentiate between the first character only and other characters. It is added when the given string starts with $.

I haven't found diacritic in the code.

Akron commented 7 years ago

So, in case we need to introduce such operators, they would also need to be prefixes.

margaretha commented 5 years ago

@Akron which operators, do you mean?

I suppose case / modality should be handled separately, see #58.

margaretha commented 5 years ago

Fixed in fa4e739270ec1c5fd62bf87a79961d683d31e497. Other flags may be added as parts of the Glemm expansion plugin.