Numbers can be written as strings

alessandropellegrini commented 9 years ago

I think that in a supercazzola is useful to write number as strings. This patch add supports for stringified numbers in both variables assignments and shifts. Numbers from zero to novemilanovecentonovantanove are recognized so far. I have tried to alter as little as possible the existing lexer and parser, so something could be made better. For example, the syntax of numbers is not checked for, and a number like millemilleundicidieci is mapped to 2021. This could be fixed by using more complex rules in the parser. Furthermore, the rule to sum up numbers in the parser uses right recursion. This consumes a bit more stack, but it was done since this allows to modify as little as possible the existing grammar, and considering that numbers wont't be so long.

As for the lexer, to avoid reducing the space of possible names, numbers live in a separate starting condition. This means that a sentence like: voglio mille, Necchi come se fosse mille is correctly interpreted (first mille is the variable name, secon mille is recognized as 1000). Note that this implies that writing later voglio antani, Necchi come se fosse mille is not mapped to the previously-declared variable, but this is a design choice.

To avoid clashes with future tokens, whenever a character which is not recognized as a possible string number is found, the lexer falls back to the INITIAL starting condition (the shift starting condition should be perfectly compatible with this behviour).

To allow for the recognition of string numbers, the appropriate starting condition should be explicitly set in the lexer. This means that, e.g., in mille a posterati, the string mille is not currently mapped to a number.

esseks commented 9 years ago

By allowing a contextual interpretation of word numerical literals, we introduce a really counter-intuitive behaviour where voglio antani, Necchi come se fosse mille produces int antani = 1000;, but mille a posterdati emits std::cout << mille << std::endl.

Also, I find confusing the fact that variables like mille can be defined but cannot be used as rhs of an assignment. Expressions and statements requiring an expression should be orthogonal. Expressions must not depend on the specific statement they are being used in.

An important pitfall we would introduce is in the context of infinite loops:

Lei ha clacsonato
stuzzica e brematura anche, se uno

which does not produce do {} while(1); as one would expect. Also, this produces a compile time error:

Lei ha clacsonato
voglio mille, Necchi come se fosse mille
mille a posterdati

since the two instances of mille are grouped together and a posterdati is left without an expression to print. Another regression is requiring a space after numerical operators. Spacing should not matter. Otherwise, splitting an expression on multiple lines produces an error:

Lei ha clacsonato
voglio antani, Necchi come se fosse 10 più
2 meno 1

All in all, I think that the cons of this addition outweights the pros. However, it makes sense to have string alias for commonly used values. Some possibilities I am considering are:

adding string constants for commons values like zero, uno, due and dieci. Other initializers are quite uncommon.
marking the start of a numerical literal, although it might break the flow of the supercazzola.

alessandropellegrini commented 9 years ago

Ok I see the points you have raised. So what if all the numbers in between zero and diecimila are interpreted as numbers in any context? I wouldn't like to have only some keywords recognized as numbers.

I can alter the patch with one single lexer rule without any starting condition. That would reduce the amount of variable names, but would make the whole stuff simpler and more consistent. What would you think in this case?

alessandropellegrini commented 5 years ago

Closing for the lack of feedback

esseks / monicelli

Numbers can be written as strings #9