eclipse-lyo / lyo

Eclipse Lyo, a Java SDK for OSLC-based tool integration
https://oslc.github.io/developing-oslc-applications/eclipse_lyo/eclipse-lyo.html
Eclipse Public License 2.0
15 stars 16 forks source link

oslc.where: IRI_REF vs. comparison operators less/less-or-equal #43

Open mfriedemann opened 5 years ago

mfriedemann commented 5 years ago

Hey,

I have picked up on earlier work with Lyo, to run an LDP endpoint with oslc query support. While doing this, I noticed an issue with the parser (or rather the lexer) for oslc.where:

https://github.com/eclipse/lyo.core/blob/2b5a849b0f0d0b9fe3a373b814da30b94f69c39b/oslc-query/src/main/antlr3/org/eclipse/lyo/core/query/OslcWhere.g#L162

The issue is that its IRI_REF lexer rule consumes '<', unless followed by a few reserved chars, which makes it impossible to compare using '<' with anything but strings and IRIs, and to compare using '<=' at all. That is because it matches '<', followed by something that matches the allowed chars, and then runs up to EOF while expecting the closing '>'.

org.eclipse.lyo.core.query.ParseException: line 1:12 no viable alternative at input '<EOF>'
    at org.eclipse.lyo.core.query.QueryUtils.checkErrors(QueryUtils.java:676)
    at org.eclipse.lyo.core.query.QueryUtils.parseWhere(QueryUtils.java:132)
[...]

Your test cases do not cover this, unfortunately. Adding the following expressions to BasicWhereTest shows the issue:

  "qm:answer<42", // fails
  "qm:answer<=42", // fails
  "qm:question<\"The ultimate question of Life, the Universe, and Everything\"", // works:
  "qm:question<=\"The ultimate question of Life, the Universe, and Everything\"", // fails
  "qm:question<<urn:qm:ultimate>", // works
  "qm:question<=<urn:qm:ultimate>", // fails

I messed with the grammar for a bit (not used to ANTL3), and it appears quite tricky due to the broad range of allowed chars in IRIs. My current workaround uses ALPHA_CHARS after LESS at line 162, which makes it skip IRI_REF for all '<=' comparisons and '<' comparisons against decimal literals, thereby making those cases work.

RFC 3986 and 3987 (URI, IRI) suggest that IRIs should start with the scheme, anyway, so forcing it to start with a letter should not introduce a limitation.

--- a/OslcWhere.g
--- b/OslcWhere.g
@@ -159,7 +160,7 @@ PNAME_LN

 IRI_REF
-    : LESS ( options {greedy=false;} : ~(LESS | GREATER | '"' | OPEN_CURLY_BRACE | CLOSE_CURLY_BRACE | '|' | '^' | '\\' | '`' | ('\u0000'..'\u0020')) )* GREATER 
+    : LESS ALPHA_CHARS ( options {greedy=false;} : ~(LESS | GREATER | '"' | OPEN_CURLY_BRACE | CLOSE_CURLY_BRACE | '|' | '^' | '\\' | '`' | ('\u0000'..'\u0020')) )* GREATER
     ;

Best regards, M.

berezovskyi commented 5 years ago

Thanks Marko for both the bug report and a suggested fix. From the first look, this seems reasonable. Me or @jadelkhoury will test it and get back to you.

Jad, do you want to take a look at it given you recently worked with Query in Lyo Store? Otherwise I can do it.

jadelkhoury commented 5 years ago

Hi

I can try to investigate this week. In the meantime, I wonder if you can investiagte the following options.

  1. Will it work if you send "qm:answer < 42" instead of "qm:answer<42"?

  2. The issue is that its IRI_REF lexer rule consumes '<', unless followed by a few reserved chars, which makes it impossible to ... compare using '<=' at all.

Will it help to add "=" as such a special character?

mfriedemann commented 5 years ago
  1. No, that will not work. The grammar does not support whitespace there at all. The only places it expects whitespace is around the "and" in compound terms and for inclusion tests with " in [...]". Apparently, it was written to be super compact and to offer only a very reduced subset of functionality to keep it simple (hah!).

  2. The thing is that '=' is a valid character to be used within IRIs. It cannot, however, (as far as I see) be used right at the beginning. Hence my, probably crude, workaround. Note that the OSLC spec doesn't actually give a list of allowed characters itself, the implementation appears to use IRI_REF as specified for SPARQL.

HTH, M.

jadelkhoury commented 5 years ago

Hi

Indeed, once the lexer starts consuming characters after LESS , there is no return, even when there is no corresponding GREATER in the string. (You'd think antlr will judge that the string does not match the IRI_REF pattern, and backtrack from there. But that does not seem to be the way it works.)

I am not familar with antlr at all, so I can't really tell if the suggested workaround might cause other problems in other scenarios. It seems robust when I tested, and I tried a few other alternatives that were not as good.

But I'm now thinking we can post at the mailing list, to see if we have anyone familiar with antlr, that might be able to help.

Otherwise, we can go for the suggested fix.

paulslauenwhite commented 4 years ago

I'm see a similar issue with boolean values.

org.eclipse.lyo.core.query.QueryUtils.parseWhere(String, Map<String, String>) failues when the olsc.where query predicate contains an inequality operator (<, <=, >, and >=) on boolean data types.

https://localhost:9443/jazz/oslc_config/resources/com.ibm.team.vvc.Configuration?oslc.where**=oslc_config:mutable<false**

According to the OSLC Core V2 specification:

Semantics of datatypes and operations on these datatypes MUST work as defined in the SPARQL Query Language (reference: SPARQL).

Source: https://archive.open-services.net/bin/view/Main/OSLCCoreSpecQuery.html#comparison_op

However, the SPARQL Query Language (see https://www.w3.org/TR/sparql11-query/#OperatorMapping) does support inequality operators (<, <=, >, and >=) on boolean data types.

jadelkhoury commented 4 years ago

I'm see a similar issue with boolean values.

What's your scenario with inequality operators on boolean? does it need to be supported really?

paulslauenwhite commented 4 years ago

Probably not in practice except for completeness with the specification.

jadelkhoury commented 4 years ago

It’s worth investigating if the SPARQL antlr grammar is more robust, and something to copy from.

See IRI_REF on https://github.com/antlr/grammars-v4/blob/adcc1b22952161286e5e7208d830d9d7a0a5ac22/sparql/Sparql.g4#L354

The similarities are striking, but they are not the same.

berezovskyi commented 3 years ago

I think the fix is valid and as long as we add some unit tests, it should be fine (as long as we agree that <#test> and </test> are invalid URIs).

@jadelkhoury I though we merged a fix for this long time ago. Do you really not want to fix this for Lyo 4.0?

@jamsden do you have any problems with this from Jazz standpoint?