However, we need the cooked value of the token to be determined before parsing, due to future constraints with signed integers (see explanation below). Therefore we cannot rely on the unary operator in the syntactic grammar. This issue prepares future versions by alleviating those constraints early.
Solution
Add the unary operators to number tokens in the lexical grammar:
Number ::= ("+" | "-")? IntegerLiteral
Update the Mathematical Value algorithm:
MV(Number ::= "+" IntegerLiteral)
is MV(IntegerLiteral)
MV(Number ::= "-" IntegerLiteral)
is -1 * MV(IntegerLiteral)
Impacts
Warning: This introduces a breaking change, specifically, when the binary operators + and - appeared directly before a number literal.
Before this change, 8+5 would be lexed as three tokens:
Therefore, in order to accommodate this change, whitespace must be inserted after the binary operators + and - in expressions. The expression 8+5 must be changed to either 8+ 5 or 8 + 5.
Background
This fix is needed in order for the transformer (the mechanism that sends tokens to the parser) to determine the actual value of number tokens during lexical analysis. An example of the problem follows.
In two’s complement representation, 4-bit signed integer values range from -8 to 7 (represented as 1000 and 0111, respectively), so we should be able to write the literal -8 in source code. But the lexer doesn’t see -8 as a single token; it sees two tokens: a punctuator with value -, followed by an integer with value 8. The translator will then fail to compute the mathematical value of the token 8, since it’s out of range (there’s no bit sequence that represents 8 in signed 4-bit precision).
If we allow the token to include -, then the translator will successfully compute its mathematical value as -8 and represent it as 1000 in memory.
Allow number tokens to be immediately preceded by a positive sign
+
or negative sign-
.Problem
Previously, e.g.,
-42
would be lexed as two tokens:However, we need the cooked value of the token to be determined before parsing, due to future constraints with signed integers (see explanation below). Therefore we cannot rely on the unary operator in the syntactic grammar. This issue prepares future versions by alleviating those constraints early.
Solution
Add the unary operators to number tokens in the lexical grammar:
Update the Mathematical Value algorithm:
Impacts
Warning: This introduces a breaking change, specifically, when the binary operators
+
and-
appeared directly before a number literal.Before this change,
8+5
would be lexed as three tokens:(which happens to be a well-formed expression per the syntactic grammar).
But after the change, it will be lexed as two:
(which is no longer well-formed).
Therefore, in order to accommodate this change, whitespace must be inserted after the binary operators
+
and-
in expressions. The expression8+5
must be changed to either8+ 5
or8 + 5
.Background
This fix is needed in order for the transformer (the mechanism that sends tokens to the parser) to determine the actual value of number tokens during lexical analysis. An example of the problem follows.
In two’s complement representation, 4-bit signed integer values range from -8 to 7 (represented as
1000
and0111
, respectively), so we should be able to write the literal-8
in source code. But the lexer doesn’t see-8
as a single token; it sees two tokens: a punctuator with value-
, followed by an integer with value8
. The translator will then fail to compute the mathematical value of the token8
, since it’s out of range (there’s no bit sequence that represents8
in signed 4-bit precision).If we allow the token to include
-
, then the translator will successfully compute its mathematical value as-8
and represent it as1000
in memory.