ad-freiburg / qlever-ui

A user interface for QLever
Apache License 2.0
19 stars 12 forks source link

Support script property in regexes #62

Open nikkiwd opened 8 months ago

nikkiwd commented 8 months ago

It would be useful to support script properties in regexes, as described at http://www.unicode.org/reports/tr18/tr18-19.html#Script_Property

According to google/re2#234, these should work if RE2 is built against ICU.

Example:

Perl supports:

Of these, the only one which works in QLever is \p{Hiragana} (https://qlever.cs.uni-freiburg.de/wikidata/5Zj8W6) Any of the others gives an error like Invalid SPARQL query: The regex "\p{sc=Hiragana}" is not supported by QLever (which uses Google's RE2 library). Error from RE2 is: invalid character class range: \p{sc=Hiragana}

In Wikidata's query service, the only ones which are supported are \p{sc=Hira} and \p{sc=Hiragana} (https://w.wiki/7xjr), so supporting those two in particular would make it easier to write queries which work in both places.

hannahbast commented 4 months ago

@nikkiwd Interesting. This is an issue for https://github.com/ad-freiburg/qlever and not for the QLever UI. Any idea how to fix this? We don't do anything special to inhibit this and we weren't aware of this feature so far.