ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
313 stars 38 forks source link

Is REGEX case insensitive now? #1314

Open belett opened 3 months ago

belett commented 3 months ago

Hi,

It seems to me that REGEX was case sensitive (as in the SPARQL standard) and is now case insensitive. Is my impression right?

And if so, s a workaround, how can I can make it case sensitive again? (I know that REGEX has a flag "i" to make it case insensitive but I can't find a flag to make it case sensitive - as it's the default)

Cheers, Nicolas

hannahbast commented 3 months ago

@belett The REGEX function itself is case-insensitive if and only if the "i" flag is provided. For example https://qlever.cs.uni-freiburg.de/wikidata/huiLRw (astronauts with name matching "neil", zero results) and https://qlever.cs.uni-freiburg.de/wikidata/U7OafN (astronauts with name matching "Neil", two results).

However, QLever has an optimized implementation for prefix search, that is, a REGEX starting with a ^ followed by a fixed string. For example https://qlever.cs.uni-freiburg.de/wikidata/RU9ISc (astronauts with name match "^neil", two results). It is configurable when building the index whether this optimized implementation is case-sensitive or not. It is currently configured to be case-insensitive. What's missing (for historical reasons) is that in that case, the optimized implementation should only be used when the "i" flag is provided. We should fix that for the sake of standard conformity.

Does this answer your question?

belett commented 3 months ago

Thanks, it does answer most of my question.

Indeed, I was looking for prefix (with query like https://qlever.cs.uni-freiburg.de/wikidata/AT4L5i where I look for French labels starting with "Église" instead of "église").

Do you have an idea when it will be fixed? And is there a way around until then?

hannahbast commented 3 months ago

@belett Yes, there is an easy workaround. As I said, the optimization is only triggered for REGEXes for the form ^ + fixed string. So just turn the fixed string into an equivalent REGEX that is not a fixed string, for example: https://qlever.cs.uni-freiburg.de/wikidata/QKyWtw