Closed Akron closed 5 years ago
The correct rewrite would force the wildcard to be a regex, with:
?
being .
+
being .?
*
being .*
I am refering here to a discussion in https://github.com/KorAP/Krill/issues/33 :
@margaretha wrote:
Ic, so Wildcards is default (priority higher) and regex is supplementary.
I wouldn't say it that way - but Wildcards are simpler to implement than any complex regex syntax like PCRE.
Nevertheless KQ implementation would not be straight forward in the case of mixed Wildcards.
There may be wildcard syntax beyond the KQ specification - these would need to be translated to Regex.
By default (Wildcards), + would be treated as a normal character, but we would like to treat this rather as a regex and rewritten it to m.?n.* so it seems that Wildcards is less useful than regex. Which QL would only need wildcards without regex? Even for C2 weneed both.
That's true. There is no benefit for query language support, I guess. Only for potential implementers. I agree, that we probably should simplify KoralQuery and deprecate support for WildCards in favor of Regex.
I have deprecated type:wildcard
in the spec and removed it from the text.
The Wildcard Search in COSMAS II depends on Case options like
If we rewrite wildcard queries by regex queries, we have to be sure that we have a regex implementation that controls these cases too.
Thank you @Bodmo . Can you point us to the relevant documentation?
Ah - thanks. Is this also part of the C2 Query language or do we have to add these flags to the query language in Koral? I think, regarding the first letter restriction this should indeed be resolved in the regex building (not only for Wildcard queries, if I understand this restriction correctly). Flags for case insensitivity and diacritic insensitivity are already part of KoralQuery and should be adopted.
Case insensitivity has been implemented, however Koral does not differentiate between the first character only and other characters. It is added when the given string starts with $.
I haven't found diacritic in the code.
So, in case we need to introduce such operators, they would also need to be prefixes.
@Akron which operators, do you mean?
I suppose case / modality should be handled separately, see #58.
Fixed in fa4e739270ec1c5fd62bf87a79961d683d31e497. Other flags may be added as parts of the Glemm expansion plugin.
The Cosmas-II Query "Schiff+ahrt" is correctly interpreted as a placeholder in a wildcard, as described here. However, the placeholder
+
is not rewritten to?
, which is a failure.