Allow to use subscript numbers in F# code

ForNeVeR commented 1 year ago

I sometimes work with scientific code and it's inconvenient in some cases. When I program a formula, it is very convenient to enter it as is. For example, there's a formula Z₀ = E / H (the first real example I found in my notes when performing calculations related to the impedance of free space), which I'd like to see in my code as it looks here. But unfortunately, in F# I have to write it as Z_0 = E / H, since Z₀ doesn't compile (link to Sharplab).

error FS0010: Unexpected character '₀' in binding. Expected '=' or other token.

I propose we add a possibility to use subscript numbers in F# code, such as:

let E = 10
let H = 50

let Z₀ = E / H // this!

Notably, it's possible to use subscript characters such as ₖₒₐₗₐ, but not numbers.

The existing way of approaching this problem in F# is to use a notation of Z_0 instead, which doesn't look that good.

Pros and Cons

The advantages of making this adjustment to F# are: there will be easier to write and, more importantly, read code based on formulas in scientific calculations.

The disadvantages of making this adjustment to F# are: well, it'd be easier to abuse this feature and name certain symbols in

either a confusing way
or in a way that makes it harder to reuse the values in other languages (say, C# doesn't allow Z₀ either).

Note that this is about subscript numbers for now, and not superscripts. Superscripts would be more confusing, since E² is easy to mix with E ** 2.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): XS.

Related suggestions: none that I was able to find.

Affidavit (please submit!)

Please tick this by placing a cross in the box:

[x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
[x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
[x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

[x] This is not a breaking change to the F# language design
[x] I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

vzarytovskii commented 1 year ago

I guess it's a similar area as https://github.com/fsharp/fslang-suggestions/issues/1104 and (probably) https://github.com/fsharp/fslang-suggestions/issues/1079. Former was rejected, latter is still open.

cc @dsyme

vzarytovskii commented 1 year ago

My main problem with those, personally, is that they're really annoying to type on normal keyboards (even with Win+; and similar).

ForNeVeR commented 1 year ago

ₖₒₐₗₐ is also annoying but still allowed.

dsyme commented 1 year ago

I have no idea how to type any of this stuff on a keyboard in standard editors (say VSCode to start). If there are simple standard input methods that don't involve copy-and-paste this stuff starts to make much more sense.

That said I'm really surprised that subscript-o is not being considered a valid number-in-identifier character. The F# spec aligns with the C# spec here and allows "subscript letters" (e.g. subscript letter o) which are Unicode category Lm.

For digits it only allows [0-9]. This actually surprises me as it means these aren't allowed:

Now I mean I can see plenty of problems with a whole lot of these - e.g. one of the "other numbers" is ⒀ and should

let ⒀⒀⒀⒀⒀ = 1

But then the vast number of other letters haven't all been vetted for being 100% sensible in code identifiers.

My gut feeling is that it would be reasonable, consistent and acceptable to adjust the F# spec to allow Nd, Nl, No in identifiers. The current situtation where subscript letters are allowed but not subscript digits feels wrong, and admitting these numeric categories into identifiers seems the most reasonable and consistent solution to this.

dsyme commented 1 year ago

Here's an example of subscript-letter-o in an F# identifier: https://sharplab.io/#v2:EYLgtghglgdgNAGxAMwM5wC4iguATEAagB8B7ABwFMYACAZQE9UNKwBYAKAUoxoFEaAXhoBGAAyduvABJCaAVglceNAFqAkgjkCA9DWmcgA=

zanaptak commented 1 year ago

I think it's reasonable to consider specific characters that are highly beneficial for readability rather than whole character categories. In fact this idea is discussed in Unicode Standard Annex 31 under Default Identifier Syntax where they suggest allowing the set of:

[⁽₍⁾₎⁺₊⁼₌⁻₋⁰₀¹₁²₂³₃⁴₄⁵₅⁶₆⁷₇⁸₈⁹₉]

rather than the full Other_Number category.

Regarding difficulty of typing, in the domains where this has the most value such as scientific and mathematical computing, those developers are likely well-versed in how to input them, and there's autocompletion and copy-paste for the rest of us.

ForNeVeR commented 1 year ago

Regarding typing: I personally use WinCompose and press Meta _ 0 to print ₀ (notably, I used to do this while working on the same scientific project), though I of course understand that there are no widely-accepted means to type this across all (or even any of) the major platforms supported by F#.

dsyme commented 1 year ago

I think it's reasonable to consider specific characters that are highly beneficial for readability rather than whole character categories. In fact this idea is discussed in Unicode Standard Annex 31 under Default Identifier Syntax where they suggest allowing the set of:

Interesting thanks. I haven't read the details but if there's a sensible proposal to conservatively extend along those lines I would approve it. Likewise the proposal I've written above seems entirely consistent with the current F# language design, but assessing that Annex 31 and including what we can seems to make sense.

I'll actually mark this specific suggestion as approved, if executed along one of these two lines above. I won't be working on it myself however, I think this has to be community driven. I'd be happy for @vzarytovskii , @baronfel and other co-maintainers in the RFC repository to together approve anything here, once a design consensus has been reached.

vzarytovskii commented 1 year ago

When working on RFC, I would appreciate the inclusion of the tooling aspect, such as - how does it work on different OSs and environments, both typing those as well as displaying - vim/Emacs in terminal, popular gui editors and IDEs, github diffs and markdown highlighting, etc.

dsyme commented 1 year ago

@vzarytovskii I get the need, though I think all those issues apply for F# 1.0 onwards, and C# 1.0 too - all the unicode characters available via this huge set of existing valid unicode characters:

regexp letter-char = '\Lu' | '\Ll' | '\Lt' | '\Lm' | '\Lo' | '\Nl'

So this isn't really raising anything new - it's just proposing a relatively modest expansion of the existing spec of identifiers to include some more numeric characters, to take into account Annex 31, which proibably wasn't written at the time of C# 1.0

vzarytovskii commented 1 year ago

@vzarytovskii I get the need, though I think all those issues apply for F# 1.0 onwards, and C# 1.0 too - all the unicode characters available via this huge set of unicode characters:
regexp letter-char = '\Lu' | '\Ll' | '\Lt' | '\Lm' | '\Lo' | '\Nl'
So this isn't really raising anything new - it's just proposing a relatively modest revision to the existing spec of identifiers, to take into account Annex 31, which proibably wasn't written at the time of C# 1.0

Yeah, I'm just curious of the level of support for other tooling, mostly because github client on my phone and tablet render it as just squares :)

dsyme commented 1 year ago

Yeah, I'm just curious of the level of support for other tooling, mostly because github client on my phone and tablet render it as just squares :)

Might be good to open a separate language/tooling suggestion - discussing the state of input method entry for unicode characters, especially ones that might reasonably be used in coding, like the examples above.

I guess it's a topic that tends to progress and change over time

zanaptak commented 1 year ago

Despite my earlier comment, I would probably lean toward the full-category approach for F# as a general purpose language. It avoids the subjectivity and potential user confusion around which characters get first-class treatment.

charlesroddie commented 1 year ago

For Unicode conformance, F# can specify a profile of characters to add or subtract to XID_Start and XID_Continue. As discussed above, there is a proposal to add a math profile and then F# could just take that.

Unicode math profile: https://www.unicode.org/L2/L2022/22230-math-profile.pdf
Clang implementation of draft: https://reviews.llvm.org/D137051 (They did this fast likely because these characters were de-facto allowed before.)
Current Unicode 15.1.0 draft 7 which contains the new profile recommendations. https://www.unicode.org/reports/tr31/tr31-38.html

Current CSharp and IL (I just checked in SharpLab) don't allow these characters in identifiers. CSharp currently doesn't have a very clear definition of allowed characters beyond what is allowed in practice now, or what was defined in ECMA-330 in Jan 2000. https://github.com/dotnet/csharpstandard/issues/305 looks at resolving this. The cleanest path would be if this were proposed there.

If this is implemented in F# without dotnet support, then the IL generated would need to be manipulated or escaped. A spec for this might be to treat the following as equivalent:

let x₀ = 
let ``x₀`` =

From SharpLab, this goes into IL as 'x_0' and doesn't have a corresponding C# syntax.

xp44mm commented 1 year ago

Subscripts are suitable for writing chemical Molecular formula, which is very intuitive. For example,

if x = Mole.H₂O then 
...

fsharp / fslang-suggestions