Open ForNeVeR opened 1 year ago
I guess it's a similar area as https://github.com/fsharp/fslang-suggestions/issues/1104 and (probably) https://github.com/fsharp/fslang-suggestions/issues/1079. Former was rejected, latter is still open.
cc @dsyme
My main problem with those, personally, is that they're really annoying to type on normal keyboards (even with Win+;
and similar).
ₖₒₐₗₐ
is also annoying but still allowed.
I have no idea how to type any of this stuff on a keyboard in standard editors (say VSCode to start). If there are simple standard input methods that don't involve copy-and-paste this stuff starts to make much more sense.
That said I'm really surprised that subscript-o is not being considered a valid number-in-identifier character. The F# spec aligns with the C# spec here and allows "subscript letters" (e.g. subscript letter o) which are Unicode category Lm
.
For digits it only allows [0-9]
. This actually surprises me as it means these aren't allowed:
Now I mean I can see plenty of problems with a whole lot of these - e.g. one of the "other numbers" is ⒀ and should
let ⒀⒀⒀⒀⒀ = 1
But then the vast number of other letters haven't all been vetted for being 100% sensible in code identifiers.
My gut feeling is that it would be reasonable, consistent and acceptable to adjust the F# spec to allow Nd
, Nl
, No
in identifiers. The current situtation where subscript letters are allowed but not subscript digits feels wrong, and admitting these numeric categories into identifiers seems the most reasonable and consistent solution to this.
Here's an example of subscript-letter-o in an F# identifier: https://sharplab.io/#v2:EYLgtghglgdgNAGxAMwM5wC4iguATEAagB8B7ABwFMYACAZQE9UNKwBYAKAUoxoFEaAXhoBGAAyduvABJCaAVglceNAFqAkgjkCA9DWmcgA=
I think it's reasonable to consider specific characters that are highly beneficial for readability rather than whole character categories. In fact this idea is discussed in Unicode Standard Annex 31 under Default Identifier Syntax where they suggest allowing the set of:
[⁽₍⁾₎⁺₊⁼₌⁻₋⁰₀¹₁²₂³₃⁴₄⁵₅⁶₆⁷₇⁸₈⁹₉]
rather than the full Other_Number category.
Regarding difficulty of typing, in the domains where this has the most value such as scientific and mathematical computing, those developers are likely well-versed in how to input them, and there's autocompletion and copy-paste for the rest of us.
Regarding typing: I personally use WinCompose and press Meta
_
0
to print ₀
(notably, I used to do this while working on the same scientific project), though I of course understand that there are no widely-accepted means to type this across all (or even any of) the major platforms supported by F#.
I think it's reasonable to consider specific characters that are highly beneficial for readability rather than whole character categories. In fact this idea is discussed in Unicode Standard Annex 31 under Default Identifier Syntax where they suggest allowing the set of:
Interesting thanks. I haven't read the details but if there's a sensible proposal to conservatively extend along those lines I would approve it. Likewise the proposal I've written above seems entirely consistent with the current F# language design, but assessing that Annex 31 and including what we can seems to make sense.
I'll actually mark this specific suggestion as approved, if executed along one of these two lines above. I won't be working on it myself however, I think this has to be community driven. I'd be happy for @vzarytovskii , @baronfel and other co-maintainers in the RFC repository to together approve anything here, once a design consensus has been reached.
When working on RFC, I would appreciate the inclusion of the tooling aspect, such as - how does it work on different OSs and environments, both typing those as well as displaying - vim/Emacs in terminal, popular gui editors and IDEs, github diffs and markdown highlighting, etc.
@vzarytovskii I get the need, though I think all those issues apply for F# 1.0 onwards, and C# 1.0 too - all the unicode characters available via this huge set of existing valid unicode characters:
regexp letter-char = '\Lu' | '\Ll' | '\Lt' | '\Lm' | '\Lo' | '\Nl'
So this isn't really raising anything new - it's just proposing a relatively modest expansion of the existing spec of identifiers to include some more numeric characters, to take into account Annex 31, which proibably wasn't written at the time of C# 1.0
@vzarytovskii I get the need, though I think all those issues apply for F# 1.0 onwards, and C# 1.0 too - all the unicode characters available via this huge set of unicode characters:
regexp letter-char = '\Lu' | '\Ll' | '\Lt' | '\Lm' | '\Lo' | '\Nl'
So this isn't really raising anything new - it's just proposing a relatively modest revision to the existing spec of identifiers, to take into account Annex 31, which proibably wasn't written at the time of C# 1.0
Yeah, I'm just curious of the level of support for other tooling, mostly because github client on my phone and tablet render it as just squares :)
Yeah, I'm just curious of the level of support for other tooling, mostly because github client on my phone and tablet render it as just squares :)
Might be good to open a separate language/tooling suggestion - discussing the state of input method entry for unicode characters, especially ones that might reasonably be used in coding, like the examples above.
I guess it's a topic that tends to progress and change over time
Despite my earlier comment, I would probably lean toward the full-category approach for F# as a general purpose language. It avoids the subjectivity and potential user confusion around which characters get first-class treatment.
For Unicode conformance, F# can specify a profile of characters to add or subtract to XID_Start and XID_Continue. As discussed above, there is a proposal to add a math profile and then F# could just take that.
Current CSharp and IL (I just checked in SharpLab) don't allow these characters in identifiers. CSharp currently doesn't have a very clear definition of allowed characters beyond what is allowed in practice now, or what was defined in ECMA-330 in Jan 2000. https://github.com/dotnet/csharpstandard/issues/305 looks at resolving this. The cleanest path would be if this were proposed there.
If this is implemented in F# without dotnet support, then the IL generated would need to be manipulated or escaped. A spec for this might be to treat the following as equivalent:
let x₀ =
let ``x₀`` =
From SharpLab, this goes into IL as 'x_0'
and doesn't have a corresponding C# syntax.
Subscripts are suitable for writing chemical Molecular formula, which is very intuitive. For example,
if x = Mole.H₂O then
...
I sometimes work with scientific code and it's inconvenient in some cases. When I program a formula, it is very convenient to enter it as is. For example, there's a formula
Z₀ = E / H
(the first real example I found in my notes when performing calculations related to the impedance of free space), which I'd like to see in my code as it looks here. But unfortunately, in F# I have to write it asZ_0 = E / H
, sinceZ₀
doesn't compile (link to Sharplab).I propose we add a possibility to use subscript numbers in F# code, such as:
Notably, it's possible to use subscript characters such as
ₖₒₐₗₐ
, but not numbers.The existing way of approaching this problem in F# is to use a notation of
Z_0
instead, which doesn't look that good.Pros and Cons
The advantages of making this adjustment to F# are: there will be easier to write and, more importantly, read code based on formulas in scientific calculations.
The disadvantages of making this adjustment to F# are: well, it'd be easier to abuse this feature and name certain symbols in
Z₀
either).Note that this is about subscript numbers for now, and not superscripts. Superscripts would be more confusing, since
E²
is easy to mix withE ** 2
.Extra information
Estimated cost (XS, S, M, L, XL, XXL): XS.
Related suggestions: none that I was able to find.
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply:
For Readers
If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.