IHTSDO / snomed-expression-constraint-language

Formal syntax and valid examples for each version of ECL
http://snomed.org/ecl
Other
3 stars 3 forks source link

Grammar behaviour in Snowstorm vs ANTLR Lab / C# Targets #6

Open martingall87 opened 1 year ago

martingall87 commented 1 year ago

Hi,

I've observed some different behaviour when validating / executing ECL in the International SNOMED CT Browser / Snowstorm vs using the Grammar with ANTLR Lab or an ANTLR Parser targeting a different language e.g. C#.

If I execute an ECL query with some erroneous text after a valid expressionconstraint rule snowstorm correctly reports that there is some unexpected input e.g.

<< 404684003 |Clinical finding (finding)|  - Some invalid text : 363698007 |Finding site| = 89187006 |Airway structure (body structure)|

The error I get back is Syntax error at line 1, character 43: unexpected character '-' expecting {, '\u0009', '\u000A', '\u000D', ' ', '/'}

However validating the same ECL statement in ANTLR Lab the parser recognises << 404684003 as a valid expression constraint and ignores the remainder of the statement.

image

I've found that adding a new top level rule including an EOF reference fixes this behaviour and the parser in ANTLR Lab correctly provides the same error.

grammar ECL;

ecl : expressionconstraint EOF;
expressionconstraint : .... 

I have noticed that the snomed-ecl-parser repo uses an older version of the ANTLR Runtime (4.5.3), I'm wondering if there has been a behaviour change in the more recent versions (e.g. 4.13.1) of the ANTLR Runtime causing this difference?

However I did find this guidance on Starting Rules and EOF on the ANTLR repo, highlighting the same issue. Should there be a new top level rule in the grammar with an EOF marker to ensure the whole input document / ECL statement is read?

kaicode commented 1 year ago

Hi @martingall87.

In our Java based implementation: The browser uses Snowstorm for the API. Snowstorm delegates ECL query parsing to ECLQueryBuilder. Here an error listener and error handler are assigned. Those get notified of trailing characters.

The EOF character seems like a useful addition to the grammar. I will discuss this with the SNOMED Computable Languages Project Group when we meet at the end of next week.

martingall87 commented 1 year ago

Hi @martingall87.

In our Java based implementation: The browser uses Snowstorm for the API. Snowstorm delegates ECL query parsing to ECLQueryBuilder. Here an error listener and error handler are assigned. Those get notified of trailing characters.

The EOF character seems like a useful addition to the grammar. I will discuss this with the SNOMED Computable Languages Project Group when we meet at the end of next week.

Much appreciated @kaicode!

Just for your interest I've just tried replicating the same error strategy and error listener overrides in my C# solution, however I still get the same issue described above. This was using the latest version of the ANTLR Runtime 4.13.1.

To investigate this further I've done some testing with previous versions of the ANTLR Runtime. I've found that in version 4.6.0, the ECL expression I mentioned above errors correctly, however in v.4.7.0 and newer it starts returning the partially parsed expression.

kaicode commented 1 year ago

I asked the members of the SNOMED Computable Languages Project Group about the EOF question. I thought perhaps that some of them may be adding EOF in their own implementations. The response was that no one has found the need to add EOF in their copy of the grammar because they are able to get an error from ANTLR when there are trailing characters. It should be noted that they are all using Java implementations.

martingall87 commented 1 year ago

I asked the members of the SNOMED Computable Languages Project Group about the EOF question. I thought perhaps that some of them may be adding EOF in their own implementations. The response was that no one has found the need to add EOF in their copy of the grammar because they are able to get an error from ANTLR when there are trailing characters. It should be noted that they are all using Java implementations.

Thanks for the feedback @kaicode.

From my testing of the C# based ANTLR Runtime (mentioned above) I've noticed that the version of the ANTLR package used changes the behaviour of the parser. At the moment the IHTSDO ECL Parser uses an older version of the ANTLR Runtime (4.5.3), I wonder if this is the same for Java if it were updated to 4.13.1?