SMLFamily / Successor-ML

A version of the 1997 SML definition with corrections and some proposed Successor ML features added.
191 stars 10 forks source link

Unicode in SML text #33

Open Hibou57 opened 6 years ago

Hibou57 commented 6 years ago

As an extension of https://github.com/SMLFamily/Successor-ML/issues/29 , allow all characters with Unicode general category letter or number to be used for alphanumeric identifiers (*), all characters with Unicode general category symbol to be used for symbol identifier and all characters with Unicode general category number to be used for numbers (**). Take care of additional end of line characters in Unicode.

(*) With, as it is already, the addition of the underscore and numbers not allowed as the first character.

(**) Optionally and not Unicode related, would be nice to allow underscores in numbers like Ada do, as it helps readability: ex. 1_234_567

JohnReppy commented 6 years ago

Note that (**) is already part f the SuccessorML specification (and is implemented by both SML/NJ and MLton). For example:

% sml -Cparser.succ-ml=true
Standard ML of New Jersey v110.81 [built: Tue May  2 11:51:11 2017]
- 123_456;
val it = 123456 : int
- 
Hibou57 commented 6 years ago

Thanks, I knew I've seen it with an SML compiler, but I was not suspecting it was already part of the standard.

About my original message, I was thinking using all characters with general category number may not be a good idea: I was thinking the character “ ² ” belongs to this category and writing 3² as 32 would be ambiguous. So I checked, and “ ² ” belongs to Other Number. So the Number category, but not the Other Number category (worth to be stressed).