fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
345 stars 21 forks source link

Allow Unicode symbols to be used as operators #1079

Open voronoipotato opened 2 years ago

voronoipotato commented 2 years ago

I propose we revisit the proposal of allowing unicode symbols to be used as operators with a simplified precedence model.

The existing way of approaching this problem in F# is to create an inline function with backticks

let ``∫`` xs = xs |> Seq.sum
//and..
let ``∪`` a b =  Set.union a b
let x = a |> ``∪`` <| b

the proposed way of writing this is

let (∫) xs = xs |> Seq.sum
let l = ∫ [1..10]
//and..
let (∪) a b = Set.union a b 
let x = a ∪ b

(yes my examples are silly :P , you know the real ones)

The proposal is novel in that operator precedence of unicode operators would simply be left to right. If that's undesirable perhaps a attribute for the precedence with level 1 being lowest and defaulting to whatever ~ is when unspecified. We could then document each of the levels in https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/symbol-and-operator-reference/ so that the creators of operators could set up precedence in intuitive ways.

[<OperatorPrecedence(1)>]
let (∫) xs = xs |> Seq.sum

[<OperatorPrecedence(3)>]
let (∪) a b = Set.union a b 

Pros and Cons

The advantages of making this adjustment to F

The disadvantages of making this adjustment to F# are ...

Extra information

Full disclosure, This has been kind of proposed before, please review. I'm hoping to revisit it now that data science and scientific computing is a target audience and that I'm proposing a simplified approach to operator precedence.

224

As discussed in the comments in the original submission, this is a minefield.... Deciding the precedence for such operators is really hard. It would also create swathes of unreadable F# code. I'll close this since we've previously decided not to do this in F# 2.0, and there has not yet been a major change of circumstance to warrant altering this which addresses the concerns. - Don Syme

Estimated cost (XS, S, M, L, XL, XXL): S

Related suggestions: (put links to related suggestions here)

224

Affidavit (please submit!)

Please tick this by placing a cross in the box:

Please tick all that apply:

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

charlesroddie commented 2 years ago

Notes

uxsoft commented 2 years ago

Hmm, even with [ Win + ; ] these sound very annoying to type to the point that I reckon I'd never use this.

Also, my experience with math is that every branch/theory has its own crazy notation which makes it very hard for newcomers to read. So this isn't a feature I'd like to have in my code.

Not opposed to the feature existing just probably not the type of person who would use this.

bisen2 commented 2 years ago

Hmm, even with [ Win + ; ] these sound very annoying to type to the point that I reckon I'd never use this.

This really just comes down to tooling. Many editors (or editor plugins) provide user friendly ways of inserting unicode characters.

chillitom commented 2 years ago

Hmm, even with [ Win + ; ] these sound very annoying to type to the point that I reckon I'd never use this.

If you haven't seen it before checkout WinCompose, I just discovered it and can't work out why such a thing isn't built in.. MacOS has been doing something similar for years.

johncj-improving commented 2 years ago

What are the limits on this proposal? Would I be able to use   as an operator? That's an en space (U+2002). There are times when I would like to alias |> that way...

voronoipotato commented 2 years ago

I think to start we would pick a unicode plane like the Basic Multilingual Plane, exclude non-visible glyphs. There's nothing stopping you from using a prettyprint extension in your editor though to make |> render as whitespace. You could also modify a font with ligatures to have |> be represented as whitespace, I don't think it would be particularly difficult to do and it would work regardless of the editor you use.

johncj-improving commented 2 years ago

I've already done the font ligature trick. I agree that if this is done, it needs some common sense restrictions.

sv158 commented 1 year ago

Recently I saw another related issue closed (#1104, the one that wanted F# allows the use of APL symbols), and labeled as probably not, which feels a bit regrettable. I have used APL, J and K (all array languages). If currently I want to use overload to implement APL-like semantics without Unicode operators, then the final readability may be worse than J because J's operators(verb/adverb) are limited to ASCII charsets.

I can understand the decision of the development team, but out of curiosity, I looked at the situation of other programming languages. The first one is Julia, I found relevant discussions in a 2021 post on the Nim official forum, which mentioned that Julia allows some Unicode characters as operators. Then I jumped to look at the states of Julia. The proposal in Julia community was first [post] in 2012 (maybe people who are engaged in scientific research really feel that symbols are not enough), and then this feature was implemented in 2014.

Then I Looking back at Nim, it has implemented this feature in 2021 (not long before and within a year), although it is only an experimental feature in the stable version at present (v1.6), but it is already available by default in the development branch. Nim's strategy is more conservative than Julia's, and the available Unicode characters are limited, with only two priority levels. Later on, I also looked at the situation of other programming languages by the way. Some support it (such as Raku), some don't (Rust, Zig), and there are not many languages that support this feature. In addition, I also saw some related blogs and felt that this requirement is indeed very practical for some people (including myself).

If F# supported this feature (introducing some Unicode symbols as operator like Nim), the readability issue might be handled by linter and formatter, or a warning could be added in a 'strict mode' (hypothetical).

P.S. more addtional off-topic information about operator overloading, the Python community has also discussed this aspect, such as using symbol overload to optimize the readability of matrix multiplication. The Elixir community also follows the same idea and has specifically implemented a syntax sugar for n**3 similar to Python for n*n*n. It's a little interesting that more and more discussions and implementations come around 2021. This may mean that programming languages are beginning to become more like a general tool (not programmer only)? After all I closed, maybe the more users from different fields, the richer the ecosystem (like {Elixir-**-Nx} ~ {Python-@/__matmul__-Numpy}).

dark-valkyrix commented 1 year ago

Yes please, it would be great to be able to define operators like ⊨, ⊕, ∧ (replacement for and?), and so on... Cartesian product (tuples) should also be replaced (or support-added) with ⨯ to match the mathematical notation, I don't like * which means something else. And support for combining diacriticals such as 20D7 (right arrow) to define vectors could be awesome.

dsyme commented 1 year ago

Re my earlier comment:

As discussed in the comments in the original submission, this is a minefield.... Deciding the precedence for such operators is really hard. It would also create swathes of unreadable F# code. I'll close this since we've previously decided not to do this in F# 2.0, and there has not yet been a major change of circumstance to warrant altering this which addresses the concerns. -

Is there any concrete proposal for precedence for such operators? That's the key missing ingredient, and it's really impossible to proceed without a proposal.

charlesroddie commented 1 year ago

Is there any concrete proposal for precedence for such operators? That's the key missing ingredient, and it's really impossible to proceed without a proposal.

Current operator precedence seems to be based on the starting character(s). https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/symbol-and-operator-reference/

I propose that operators starting with a currently-disallowed unicode symbol are given the same precedence, just lower than all existing precedences for operators: lower than prefix operators but above .. Operators containing currently-disallowed symbols but starting with currently-allowed characters can fit into existing rules.

sv158 commented 1 year ago

Is there any concrete proposal for precedence for such operators? That's the key missing ingredient, and it's really impossible to proceed without a proposal.

Few days ago, I saw that Nim officially released version 2.0, and the '--experimental:unicodeOperators' flag had already been removed. They eventually accepted 21 Unicode operators , with 13 having the same precedence as multiplication (∙ ∘ × ★ ⊗ ⊘ ⊙ ⊛ ⊠ ⊡ ∩ ∧ ⊓) and the remaining 8 having the same precedence as addition (± ⊕ ⊖ ⊞ ⊟ ∪ ∨ ⊔).

I think this approach (i.e., adopting a subset of Unicode operators first) might be worth considering. In this way, the main problem shifts from complex precedence rule settings to a relatively simpler process of filtering out the most needed Unicode operators and grouping them accordingly.

MaxWilson commented 10 months ago

Hmm, even with [ Win + ; ] these sound very annoying to type to the point that I reckon I'd never use this.

This really just comes down to tooling. Many editors (or editor plugins) provide user friendly ways of inserting unicode characters.

Could you have a tooling plugin that simply hides backticks around Unicode characters? Then you wouldn't need a language change.