apache / incubator-kie-drools

Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.
http://www.drools.org
5.85k stars 2.49k forks source link

[incubator-kie-drools-5879] [new-parser] build failure on kie-dmn-val… #5899

Closed tkobayas closed 4 months ago

tkobayas commented 4 months ago

…idation

Issue:

tkobayas commented 4 months ago

This issue is pretty puzzling. In this PR, I add 2 test cases orWithMethodCall and orWithMethodCallWithArg to some test classes. Probably it's easy to debug with DRLExprParserTest because this issue focuses on a constraint.

orWithMethodCall is correctly parsed to value == 10 || someMethod() == 4. || matches OR in conditionalOrExpression

orWithMethodCallWithArg raises an error no viable alternative at input 'someMethod', because the parser look into orRestriction and || matches OR in the orRestriction. Then, it evaluates the semantic predicate {(helper.isPluggableEvaluator(false))}? in operator_key and returns false. I expected that the parser recovers from the path and goes back to conditionalOrExpression and results in the same structure as orWithMethodCall case.

At first, I thought the error recovery was wrong, so I raise a SO thread (https://stackoverflow.com/questions/78417977/semantic-predicate-false-throws-noviablealtexception-in-adaptivepredict-then), but probably it was not a good question. Error recovery is to proceed the parser as possible just to help developers to understand the error. We should avoid "error" in the first place.

In Antlr3, we use "syntactic predicate" : https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/resources/org/drools/drl/parser/DRL6Expressions.g#L412-L424 (DOUBLE_PIPE fullAnnotation[null]? andRestriction)=> triggers "backtracking" which explores the possible paths and doesn't raise an error even if it hits the semantic predicate false. It just thinks, "okay, we cannot match this path".

Antlr4 says it no longer has "syntactic predicate" because Antlr4 can explore paths better, so no need of "syntactic predicate". However, in our case, it results in an error while it has other alternative match (= same as orWithMethodCall case)

tkobayas commented 4 months ago

Indeed we use || in conditionalOrExpression (for simple OR connection) and orRestriction (for a half constraint like a == 10 || == 20).

https://github.com/antlr/antlr4/blob/master/doc/predicates.md#finding-visible-predicates

The prediction process also can't see through token references. Token references have the side effect of advancing the input one symbol. A predicate that tested the current input symbol would find itself out of sync if the parser shifted it over the token reference.

So probably, we should not expect the semantic predicate {(helper.isPluggableEvaluator(false))}? to resolve the ambiguity in Antlr4.

tkobayas commented 4 months ago

FYI, Intellij IDEA Antlr4 plugin view of value == 10 || someMethod() == 4 (good case)

good-parse

tkobayas commented 4 months ago

view of value == 10 || someMethod(value) == 4 (bad case). Note that it's okay for the plugin to parse the input like this, because the plugin doesn't consider a semantic predicate when rendering.

bad-parse

tkobayas commented 4 months ago

@yurloc @mariofusco Feel free to share your thoughts on this. There are more things to investigate. I might be overlooking something.

yurloc commented 4 months ago

Quick ideas (feel free to ignore if it's too cryptic but I need to put it down) - I can see 3 options:

  1. Eat the || token temporarily to make the operator predicate visible and kill the unwanted alternative. Then puke it (go 1 step backward in the token stream), and exit relationalExpression as expected.
  2. Do nothing. Keep the predicate invisible during alternative exploration, then let it fail with no viable alternative (that's what currently happens) and then use custom error strategy to somehow recover and exit relationalExpression as expected.
  3. Somehow redesign the rules. Find another way to kill the alternative without using a sempred (disallow parExpression after operator if we're inside relationExpression???) resulting in a compromise that keeps general scenario working, makes this specific usecase work but disables some edge-case expressions. Use a nongreedy subrule? Guard the || orRestriction alternative with a new sempred (helper.isPluggableEvaluator()) that looks at the next-next token?
yurloc commented 4 months ago

Lol, the nongreediness seems to work, and it's one-character fix :rofl:

diff --git a/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4 b/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
index 02c20088f4..ac9e0a2d05 100644
--- a/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
+++ b/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
@@ -519,7 +519,7 @@ orRestriction returns [BaseDescr result]
                $result = descr;
            }
          }
-   )* EOF?
+   )*? EOF?
   ;

 andRestriction returns [BaseDescr result]

Seriously - it looks promising as it makes the tests in this PR pass but I'm running into some issues in the kie-dmn-validation module. Might need another pair of :eyes:

yurloc commented 4 months ago

"Some issues" in kie-dmn-validation solved. It's an unrelated problem.

That means the mentioned nongreediness (?) fix is a fix. I'm not yet sure how good a fix it is. It might be a proper fix, an acceptable workaround or a bad workaround.

yurloc commented 4 months ago
diff --git a/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4 b/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
index 02c20088f4..ac9e0a2d05 100644
--- a/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
+++ b/drools-drl/drools-drl-parser/src/main/antlr4/org/drools/drl/parser/antlr4/DRL6Expressions.g4
@@ -519,7 +519,7 @@ orRestriction returns [BaseDescr result]
                $result = descr;
            }
          }
-   )* EOF?
+   )*? EOF?
   ;

 andRestriction returns [BaseDescr result]

@tkobayas It might actually be a good fix. I'm afraid that the nongreediness can break some other use cases, something like org.drools.drl.parser.antlr4.MiscDRLParserTest#halfConstraintOperators but this one is unaffected. Please try to prove it breaks something, otherwise I think we have a fix.

tkobayas commented 4 months ago

@yurloc Thank you!!! It's an excellent fix! Nongreediness seems to be a good tool for us to control the alternative priority.

tkobayas commented 4 months ago

@yurloc @mariofusco @gitgabrio Please review, thanks!