SemGuS-git / Semgus-Parser

Library and tool for parsing SemGuS specifications
MIT License
4 stars 1 forks source link

Negative integer literals not parsed correctly #5

Closed WileyCorning closed 3 years ago

WileyCorning commented 3 years ago

I haven't investigated this too deeply, but a constant value of e.g. -5 is not parsed as an integer literal. See for example this unit test, where the case involving negative numbers fails.

I believe this is due to the following part of semgus.g4:

SYMBOL : ([a-z]|[A-Z]|'_'|'+'|'-'|'*'|'&'|'|'|'!'|'~'|'<'|'>'|'='|'/'|'%'|'?'|'.'|'$'|'^')(([a-z]|[A-Z]|'_'|'+'|'-'|'*'|'&'|'|'|'!'|'~'|'<'|'>'|'='|'/'|'%'|'?'|'.'|'$'|'^') | ([0-9]))*;

INTEGER : ('-'?([0-9])+);

It looks like the rule for SYMBOL will match a '-' followed by zero or more numeric characters; this may be taking precedence over INTEGER here.

kjcjohnson commented 3 years ago

It looks to be the case.

I think all that's needed is moving the SYMBOL part below REALCONST. I pretty much just lifted these parts from the Sygus grammar, and they have SYMBOL last.

kjcjohnson commented 3 years ago

Actually, after reading the SyGuS-IF 2.0 spec, I think the current behavior may incidentally be correct, and we based our grammar off an older version.

In the latest spec, SyGuS doesn't support negative literals and instead requires them to be written as, e.g., (- 5).

Still, our current behavior is a bit questionable in that it allows -5 as a symbol instead of emitting an error or warning that you're probably doing something wrong.

WileyCorning commented 3 years ago

Thanks for the info. Do you see a strong reason to avoid supporting negative literals? It would seem to me like a reasonable feature to expect.

Unless there are any objections, I'll go ahead and implement + test the fix you proposed.

kjcjohnson commented 3 years ago

I agree it's reasonable, but it's not in the SMT-LIB standard, as far as I can tell.

The syntax for terms and sorts now coincides with the corresponding syntax for terms and sorts from SMT-LIB versions 2.0 and later. There are three notable changes with respect to the previous SyGuS format that come as a result of this change. First, negative integer and real constants must be written with unary negation, that is, the integer constant negative one must be written (−1), whereas previously it could be written −1. [...]

https://sygus.org/assets/pdf/SyGuS-IF_2.0.pdf (page 2)

And page 95 of https://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2021-04-02.pdf shows:

<numeral> ::= 0 | a non-empty sequence of digits not starting with 0 <decimal> ::=<numeral>.0*<numeral>

which does not include a leading minus.

Since SyGuS explicitly removed support for this, I'm very hesitant to add it in here. If we say we're using SMT-LIB format, then I think we should just defer to that spec instead of doing something that's sort-of compatible and having to remove it later.

kjw227 commented 3 years ago

I agree that we should stick to SMT-LIB syntax for now, at least until we understand why terms like -1 are disallowed. As Keith says, SMT-LIB for some reason disallows this syntax. In page 38, it states that

The set of values for the Int sort consists of all numerals and all terms of the form (- n) where n is a numeral other than 0.

kjcjohnson commented 3 years ago

Closing this as not a bug. SMT-LIB2 doesn't support negatives, and the upcoming S-expression support sticks to SMT-LIB2 syntax.