fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
345 stars 21 forks source link

Expand the F# alphabet to allow APL symbols to be used within identifiers #1104

Closed Korporal closed 1 year ago

Korporal commented 2 years ago

I want propose that we expand the range of Unicode characters that are legal within F# identifiers and custom operators to include the set of currently excluded symbols known as "APL symbols" some examples of which are and and and .

There have been earlier suggestions that F# have its alphabet expanded to support symbols like this, for example: https://github.com/fsharp/fslang-suggestions/issues/1079. However rather that adopting a completely new set of symbols this suggestion recognizes that the APL symbol set already provides a rich set of symbols that have an established computational meaning and that there are tools and utilities and even keyboards that work with these characters.

Several symbols used within APL are not specific to APL for example (APL "drop") is illegal in F# but is also not strictly an APL symbol (it is U+2193) but would be included in this suggestion. In addition a few APL symbols are in fact Greek letters (e.g. ρ) and already legal for use as identifiers (See F# language specification 4.1 section 3.4 - '\Lm')

The existing way of approaching this problem in F# is to use two double-backtick tokens to enclose the identifier. This mechanism allows all of the APL (and many other characters) to be used within an identifier but does add the visual "clutter" that the double-backtick brings with it.

A list of the Unicode characters that are regarded as being part of APL can be found here this list contains 93 characters.

The historic IBM set of Unicode characters regarded as being part of APL can be found on this page this list contains 81 characters. The IBM list does not include a number of symbols that are today regarded as being APL symbols for example (U+2366).

In addition because these symbols are all a single character and have never been used before in F# source code, it should be feasible to add them to the list of allowed custom operator symbols.

Although APL was the impetus for this suggestion it is not it's goal, the goal is to increase the flexibility for naming functions and operators by adopting a set of symbols that are completely new to the language, yes many of the symbols do correspond to APL operators and functions but APL is one of the few languages (perhaps the only language) that uses such a large number of symbols like this.

Pros and Cons

The advantages of making this adjustment to F# are that it avoids the need to rely on the double-backtick when using these symbols.

It will also make it easier to represent foundational operations and functions in user code, with symbols resembling mathematical notation (indeed this was one of the driving forces behind APL originally).

The suggestion does not call for a change to the language grammar other than the definitions for identifiers and the non-terminal first-op-char defined in the language specification, currently this set of characters is limited to ! % & * + - . / < = > @ ^ | ~.

This current set is quite limited and can make it difficult to introduce new operators where the lexical similarity to an existing operation is hard to avoid, for example:

let inline (/) n s = Seq.skip n s 

can be written today but does require the reader of the code to pay close attention when perusing the source code, by having a wider range of symbols available this kind of thing becomes easier.

Finally, it provides a means for porting some types of existing APL code to F# without having to abandon the expressive symbols that APL uses for certain operators. It could also foster links with the APL community which could lead to further F# adoption.

The disadvantages of making this adjustment to F# are (initially at least) establishing a formal definition of exactly what character codes are to be included, possible confusion were F# code to be written that attaches a different meaning to a symbol than does APL in cases where one is porting APL code to F# is also something to be borne in mind but again this suggestion will facilitate such porting but is not driven by this.

Another thing to consider too, is it introduces a risk that cavalier abuse of the feature could lead to unreadable or confusing code, but I'd argue that this is possible today anyway and has not led to problems in F# being adopted and used.

The purpose of this suggestion is not to attach meanings to these Unicode symbols nor to support some specific set of symbols particular to some version of dialect of APL. Rather it is to allow any character/symbol that is commonly used within the two dominant APL dialects APL2 and Dyalog APL.

Examples of APL usage

APL is not a functional language but does operate on multidimensional arrays, aggregates of data using operators that eliminate any explicit need for looping constructs. APL was (perhaps still is) the only language to adopt a large number of symbols to identify language functions. This was only possible at the time because IBM were in a position to invest in supporting the uses of such symbols when they had no representation in the then dominant ASCII and EBCDIC character codes. Almost all programming languages since have restricted the symbols used by their grammar yet with Unicode now firmly established this tradition is no longer as necessary as it once was.

There are some operations in APL that bear a strong resemblance to some operations done in F#, for examples:

Take is similar to Seq.take seen in F#, likewise is similar to Seq.skip.

The result of executing the APL expression 3 ↓ 5 4 3 2 1 is the shorter array 2 1 for example. This suggestion if implemented would enable us to write a function like this:

let ↓ sequence n =
    Seq.skip n sequence

enabling us to write:

3 |> ↓ [5;4;3;2;1] 

and if use as operators name is also added this becomes possible:

let inline (↓) n s = Seq.skip n s 

enabling:

3 ↓ [5;4;3;2;1]

resulting in FSI as:

val it: seq<int> = seq [2; 1]

Extra information

Estimated cost (XS, S, M, L, XL, XXL): I'm not able to estimate this at this early stage, I perceive it as largely a change to the F# lexical analyzer which may mean it's a small (S) change, the CLR allows names with arbitrary characters already (for example when backticked, becomes '' inside the generated assembly so there seems to be no risk of introducing an incompatibility with the CLR.

There is likely an impact too on the Visual Studio or Visual Studio Code IDEs and I can't comment on that at this point but these additional impacts might elevate this change to a M.

Related suggestions: (put links to related suggestions here)

Affidavit (please submit!)

Please tick this by placing a cross in the box:

Please tick all that apply:

For Readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

CyrusNajmabadi commented 2 years ago

It could also foster links with the APL community which could lead to further F# adoption.

I'm curious to hear more about this. Specifically:

  1. how big the APL community is.
  2. how much of the APL community is considering moving to F#.
  3. what percentage of that group we'd think would find using backticks to be a deal-breaker for that work.

Having not heard about this request up till now, i'm skeptical that this really needs to be done and that existing solutions are both sufficient and desirable (since F#/Ocaml users are already familiar with backtick approach to introducing symbolic names like these).

dsyme commented 2 years ago

My understanding is that the APL community is very small but some banks have substantial amounts of APL still on their books and influential APL developers/quants. I don't think we could get a meaningful measure on the other questions - there will be bigger reasons not to move APL code than this (legacy, compat etc.) - but removing the backticks would however make moving more palatable.

piaste commented 2 years ago

I do not see myself using this set of characters but I can't think of any reason to forbid them.

If someone wants to abbreviate e.g. Matrix.transpose as a symbol, I'd rather have them use an unambiguous and easily-Googled ⍉ character than having them invent their own notation.

Korporal commented 2 years ago

I suppose its conceivable that some of the APL symbols might represent fundamental operations on data that have clear and direct analogs in F#, that would seem to suggest that the F# might itself predefine some of them in such a future release. These symbols could of course be "overloadable" by users of the language as is the case with many symbolic operators today.

By "fundamental" I mean not novel, somewhat natural operations that emerge from the nature of data as opposed to something very specific to the APL language itself. Examples of this in F# today are + or - or < these have predefined default semantics in many programming languages (including APL) not just F#.

An example of such fundamental operators are and which represent operations akin to Seq.skip and Seq.take and so could be considered candidates for being added as predefined operators in F# (if this overall suggestion were to proceed).

A similar argument can be made for and ("grade up" and "grade down") which perform a fundamental index generation on a sequence (it use a stable sort algorithm in APL):

Converting the list: 110 109 204 40 105 201 2 208 160 143 213 31 21 317 132 242 164 176 67 18 75 89 18 7 20

into the list

6 23 19 22 24 12 11 3 18 20 21 4 1 0 14 9 8 16 17 5 2 7 10 15 13

In F# this can be achieved with (let input be the input list)

input |> List.indexed |> List.sortBy (fun (_,x) -> x) |> List.map (fun (x,_) -> x)

this operation in APL is represented by the "grade up" operator as:

⍋ 110 109 204 40 105 201 2 208 160 143 213 31 21 317 132 242 164 176 67 18 75 89 18 7 20

that is:

⍋ input

Now whether grade up (and grade down) are "fundamental operations" on a list is of course arguable, but if it were to be regarded as such and we were to allow this and other symbols to become identifiers/operators it could become a new F# operator, not because we want to copy APL but because APL has simply been the first programming language to establish the symbol to mean what it means and need not be regarded as language specific any more than say + or > or and so on.

At the time APL was designed Unicode simply did not exist and IBM had to devise proprietary schemes to represent and render these symbols (hence their use being limited to APL) but as this is no longer the case we can view these symbols afresh and consider their merits in terms not just of APL but as generalized operations on lists and arrays.

Korporal commented 2 years ago

I do not see myself using this set of characters but I can't think of any reason to forbid them.

If someone wants to abbreviate e.g. Matrix.transpose as a symbol, I'd rather have them use an unambiguous and easily-Googled ⍉ character than having them invent their own notation.

Yes the transpose symbol is a fine example and could be a candidate for inclusion as a function in the base F# language along the lines of and discussed above. These kinds of symbolic names for operators/functions can help transcend human language barriers too.

The Mandarin for "max" could be 最大 as readable to me as max might be for a native Mandarin speaker, but transcends English, Mandarin, Hebrew or Farsi and once internalized can become what + or > already are today.

CyrusNajmabadi commented 2 years ago

If someone wants to abbreviate e.g. Matrix.transpose as a symbol, I'd rather have them use an unambiguous and easily-Googled ⍉ character than having them invent their own notation.

I'm not certain why ⍉ would be easier to search for tbh. Symbols, naturally, may be much tougher to search for given all the domains they appear in.

Note: seeing ``⍉`` honestly feels much more natural to me as it would be immediately clear that this is just a normal name as opposed to something special built-in in F#.

CyrusNajmabadi commented 2 years ago

I'm not clear on why:

⍋ input

Would be more desirable than:

``⍋`` input

The latter being much more clear to me that this is simply a case of a normal application, just with a specially named operator that is outside of the bounds of normal identifier rules. Indeed, with Ocaml/F#, the usage of ``...`` was always introduced for precisely this purpose. A way to have cute, domain-specific, names that were easily identifiable and could now extend to the entire character set, instead of being in the smaller domain of names that the lexer/parser accept.

These operators can still have their APL meaning (if provided by an appropriate lib). However, i would far prefer they use the existing mechanism for referring to names than become special to the language itself.

Korporal commented 2 years ago

If one was to extend the set of characters permitted in a custom operator name then one could define custom operators freely (which currently do not require the backticks) and leave the identifier alphabet alone, then this becomes possible:

let inline (↓) n s = Seq.skip n s
let inline (↑) n s = Seq.take n s

Then symbols like could then be used in function names but would require the backticks if used that way. Then if one wanted to use as if it were a two argument function (as opposed to an infix operator) one could write:

(↑) 3 [1;2;3;4;5;6;7]

which is the same as

3 ↑ [1;2;3;4;5;6;7]

But then one is restricted from using the new characters in anything other than postfix and infix operators. On the other hand one could leave the operator symbol set alone and simply extend the alphabet for function names and dispense with the backticks for these allowing functions with any number of arguments to now have a broader set of identifier naming options.

In this case if one wanted to treat a function named as an infix operator one would could write:

3 |> ↑ <| [1;2;3;4;5;6;7]

I don't think the language's grammar will allow any character that can appear in an operator name to also be in an identifier, not without breaking backward compatibility.

It seems on this basis that extending the characters allowed for identifiers and leaving operators as they are, offers the most flexibility since we can gain the benefit in symbolic naming for functions of any number of arguments.

Happypig375 commented 2 years ago

Is this readable to an outsider? With basic education in mathematics, everyone knows the meaning of + - * / < > =. It isn't obvious what ↑ ↓ would mean, in the same way that computation expressions are favoured over >>= <!> <*>.

Korporal commented 2 years ago

Is this readable to an outsider? With basic education in mathematics, everyone knows the meaning of + - * / < > =. It isn't obvious what ↑ ↓ would mean, in the same way that computation expressions are favoured over >>= <!> <*>.

Is this readable to you こんにちは世界 it's not readable to me but it is readable to someone who's learned to read it.

I'm not saying we should necessarily attach meaning to these symbols, only that we could if the meaning established by APL was seen as fundamental and not simply some APL concept.

Mathematics is full of symbols that are unfamiliar to us until we learn about them.

If symbols can be used rather than English words, then that goes some way to making the language more accessible to non-English speakers.

After thinking on this for a few days anyway, I now think there's limited scope for using these symbols as language operators, allowing them in identifiers seems - to me - to offer the more flexibility than allowing them in operators.

Mathematics - because it is symbolic - is universal, people from many different cultures can all understand something like this:

image

despite perhaps not speaking the same languages.

CyrusNajmabadi commented 2 years ago

Mathematics - because it is symbolic - is universal, people from many different cultures can all understand something like this:

I'd say there is a broad difference between mathematical programming languages and general purpose, text based, programming languages though.

Note that even for mathematical languages I would have no expectation that any of the symbols shown here so far would carry any meaning.

CyrusNajmabadi commented 2 years ago

despite perhaps not speaking the same languages.

Sure. But that picture has stuff like division and square root. Those are potentially pretty widely understood. However, there's a big gulf between those common operations and the sorts in the apl toolset.

Korporal commented 2 years ago

APL is relevant here only for the reason it is rooted in efforts to extend mathematical notation to the domain of mechanical data manipulation, this was the basis for Iverson eventually receiving a Turing award in 1979. Iverson was motivated by how then current notation was unable to express some of his ideas for data processing, he began to look to matrix algebra and related ideas.

His notation1 became the language APL, it is the notation and symbolic operations on data that drove the development of APL.

Today we have Unicode and that standard fully embraces all of the symbols developed by Iverson, these symbols are available for use by us today if we choose to do so.

The concepts of taking and dropping elements from a "vector", sorting and ordering, repetitive function application ("reduce"), rotation, filtering, composition and more all all represented symbolically and with symbols often close to mathematical notation we see today in related fields.

Function composition in mathematics and when discussing functional languages uses the the ring operator and that is the symbol also used in APL for the same operation.

This suggestion is that we consider adjusting the F# language to allow use of these Unicode characters affording developers an opportunity to leverage some of the benefits resulting from Iverson's innovative work.

Every time I've explained the meaning of things like ⍋ or ⍒ or or or or or or to a developer they quickly understand and seem to have no difficulty comprehending, even non-IT people can quickly grasp the meaning and begin to see how these can be combined, also I want to mention again that just as with other mathematical notation people from different cultures speaking different language can share and understand code when it uses notation like this.

This

3 ↓ [5;4;3;2;1]

Is not tied to the English language whereas this is:

Seq.take 3 [5;4;3;2;1]

To a Japanese speaker "take" is the anglicized spelling for what we call "bamboo" and I'm sure there are many other similar examples we could find.

Having the option to write:

let inline (↓) n s = Seq.skip n s
let inline (↑) n s = Seq.take n s

would give teams and companies and institutions an ability to reduce the culture dependency that many modern programming languages impose.

International appeal, international readability is an aspect of programming languages that has been long neglected and embracing ideas like Iverson's is one step toward improving that.

What I'm personally not yet clear about is whether these should be permitted for use in operator definitions or general identifiers. Given that these symbols represent prefix (monadic) and infix (dyadic) operators and are used that way in APL too, I am gradually being drawn in that direction, that these symbols could be used by development teams for operator names and if they so chose they could leverage some of the power of the Iverson notation in their functional code, I'd like to see how this discussion develops to see what other F# users in general think about this.

1. Notation as a tool for thought - Iverson - 1979 Turing Award lefture.

CyrusNajmabadi commented 2 years ago

not tied to the English language

Except for using numbers not present in many languages.

To a Japanese speaker "take" is the anglicized spelling for what we call "bamboo"

I've had discussions with many foreign language developers over the years. This approach didn't 5 really work because it's not scalable. You may have a couple more operations made available through an operator. However, those developers say that they still need to spend just as much time learning the wealth of apis out there with names written in. English ASCII.

This approach would literally only impact roughly 0.01% of names (less actually). So developers (of any background) are still in the same position wrt having to learn all the apis.

Note: as mentioned before, you can still use these names. Just using standard f# name escaping. That seems like a totally ok way to have these operators from a lib if desirable.

CyrusNajmabadi commented 2 years ago

International appeal, international readability is an aspect of programming languages that has been long neglected and embracing ideas like Iverson's is one step toward improving that.

To me it sounds like something which barely moves the needle (see above). If you wanted to attack "international readability", I think it's going to take a much more fundamental effort from the ground up on how to go about doing that.

Korporal commented 2 years ago

Note: as mentioned before, you can still use these names. Just using standard f# name escaping. That seems like a totally ok way to have these operators from a lib if desirable.

The F# double-backtick is only available for identifiers, it cannot be used to invent names for operators.

CyrusNajmabadi commented 2 years ago

Every time I've explained the meaning of things like ... to a developer they quickly understand and seem to have no difficulty comprehending

This has been true with my experience teaching normal apis as well to non English speakers. So these operators don't seem special in that regard. Instead, the lesson seems to be that non-English speakers are comfortable picking up constructs in programming languages.

This may actually be benefited from nearly all widespread programming languages using ascii-english names for 99.99%+ of cases.

I'd be interested, like in my original posts, with data showing that there is a problem. The problem is on this area of the language, and that this solution is substantively better for solving the problem vs the existing options the language already exposes for using Unicode in names of symbols.

CyrusNajmabadi commented 2 years ago

I would not be opposed to allowing online operators to use backticks. It would be a much more broadly applicable and useful proposal imo, expanding fat beyond this legacy code page.

CyrusNajmabadi commented 2 years ago

I'm missing something. This works fine on my end:

let ``↓`` n s = Seq.skip n s

And it reads fine to use as well.

Korporal commented 2 years ago

I'm missing something. This works fine on my end:

let ``↓`` n s = Seq.skip n s

And it reads fine to use as well.

That isn't an operator definition, it's a function.

CyrusNajmabadi commented 2 years ago

I'm not sure why that matters. Indeed, it seems very easy to teach and works out of the box today :)

Perhaps there could be a proposal about allowing backticked operators. That seems more generally useful and applicable beyond this single codepage.

cartermp commented 2 years ago

Backticked identifiers as operators might be up for consideration. From the compiler's perspective there's a fixed set of symbols you can use today, a backticked identifier is just an identifier, and you can't use identifiers as operators.

I think this can be considered as an enhancement to the language.

Korporal commented 2 years ago

I've been reading about how several functional languages separate functions and (infix) operators and from what I can see the alphabets used for naming each are always disjoint, no overlaps (the same seems to be true of imperative languages too). This makes it easier for the parser to unravel code as well as the human eye.

Haskell accepts a pretty large range of Unicode symbols as legal for naming operators and has conventions similar to F# for being able to use a function as an infix operator or an infix operator as a function, similar to that used in F#.

I therefore want to alter the goal of this suggestion to the narrower one of allowing a broader range of symbols in F# for operator naming, I'll retitle this suggestion to that end, no change is suggested for the naming of identifiers.

Having briefly looked at the F# lexer source files, adding completely new symbols to the set already seat aside for operator names seems a relatively low risk change. The new characters clearly have never been used before so there's no backward compatibility concern or potential ambiguities.

F# source is stored in UTF-8 encoding so can fully support all Unicode symbols, this is how we can use characters like the Greek in our source code today in identifier names.

No grammar change arises so there's no impact on the parser (unless the parser somehow looks at these characters directly rather than tokens composed from them), the definition syntax for custom operators remains unchanged.

Because the new symbols are obviously absent from the language today there can be no cases where a conflict with existing source code can arise.

In F# today I can create operators with rather long "names", here's some with three characters:

let inline(>--) a b = Seq.skip a b

let inline(<++) a b = Seq.take a b

enabling:

image

these are legal too ===>> as is ->->+ and so on.

Because the set of available symbols is currently limited to !, $, %, &, *, +, -, ., /, <, =, >, ?, @, ^, | one soon finds it necessary to combine symbols into multi-character operator names that can begin to get unwieldy, were the available set larger one could define single symbol operators more easily and only create compound operator names in cases where it had notational significance and value, reducing the need to devise compound operator names would seem to be a benefit to developers.

I'm happy to redraft this suggestion with the narrower focus on operator symbols alphabet if this is the best way to carry this discussion forward.

In addition, with sufficient interest I will prepare a branch on my fork of the F# repo, that contains a change that serves as a proof of concept.

CyrusNajmabadi commented 2 years ago

I'm happy to redraft this suggestion with the narrower focus on operator symbols alphabet if this is the best way to carry this discussion forward.

To me, this would make the most sense. What do you think @cartermp ?

I think it would also be better to not limit this to a specific, mostly-legacy, code page. Instead, for example, it would be interesting to potentially allow anything in the UnicodeCategory.MathSymbol (as long as it doesn't overlap other ambiguous categories).

olivercoad commented 2 years ago

If adding new options for operator symbols, it would be nice if they could fulfil more combinations or high/low precedence, left/right associativity and infix/prefix operators. Specifically, I feel it would be beneficial to have more options for making right-assiciative operators.

For example, currently the only way to make a low(ish) precedence right-associative operator is with ^, or ** with slightly higher precedence (example use case). I think I've also at times wanted a prefix operator with lower precedence than function application but I can't remember what the use case for that was.

https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/symbol-and-operator-reference/#operator-precedence

@Korporal what assiciativity and precedence do the proposed APL operators have?

voronoipotato commented 2 years ago

The F# double-backtick is only available for identifiers, it cannot be used to invent names for operators.

Backticked identifiers as operators might be up for consideration. From the compiler's perspective there's a fixed set of symbols you can use today, a backticked identifier is just an identifier, and you can't use identifiers as operators.

I think this can be considered as an enhancement to the language.

I think this makes sense and would also remedy my problem https://github.com/fsharp/fslang-suggestions/issues/1079. I can always make an extension in my editor to dim the backticks and auto-surround unicode with backticks. I don't think you have to "narrow down the symbols", because most people won't use this unless they have a specific domain that they're using this in that makes sense such as APL. This is similar to how backtick symbols are used today. It also allows for infix operators that are words, which may be useful for creating natural language scripts like "gherkin" or even making things like canopy more readable without having to memorize operators. Some operations are easier to think in infix because that's how the domain already talks about the things.

If adding new options for operator symbols, it would be nice if they could fulfil more combinations or high/low precedence, left/right associativity and infix/prefix operators. Specifically, I feel it would be beneficial to have more options for making right-assiciative operators.

For example, currently the only way to make a low(ish) precedence right-associative operator is with ^, or ** with slightly higher precedence (example use case). I think I've also at times wanted a prefix operator with lower precedence than function application but I can't remember what the use case for that was.

https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/symbol-and-operator-reference/#operator-precedence

@Korporal what assiciativity and precedence do the proposed APL operators have?

APL has no operator precedence, it's left to right as god intended 😛. Custom precedence is an interesting proposition, but I personally think it should be a separate question with its own considerations.

Korporal commented 2 years ago

@voronoipotato

APL is almost incidental here; I was originally struck by the fact that today we have a huge set of special symbols available via the Unicode standard and it naturally raises the question - why restrict the set of characters allowed for naming operators given that we have these available, I'm sure this was not a particularly important issue during the design of F# as it's goals were focused elsewhere.

APL was initially designed to represent mathematical concepts and the notation is rooted in mathematics, making a real computer language to implement APL came later but was not the original motivation for the research.

Much of Iverson's early work is nothing to do with computers but with notation, he was impressed by how notation actually assisted, enabled thinking as is the case more generally in mathematics. His lecture when he received the Turing award in 1979 explores that theme.

This harks back to philosophy, where language (in the general sense, not the narrow computing sense) itself influences thought, the language we adopt and express ourselves in, intimately impacts how we think, Wittgenstein did a lot of work in this area.

The easier it is to express an idea in some domain in some language them more utility the language has in that domain, so by exploring how we can increase the range of symbols used to name operators could server to further that idea.

APL is perhaps the most well known example of a recent notation for a wide range of operations on sets ("vectors") where single symbols represent operations (functions) and shows how expressive this idea can be.

voronoipotato commented 2 years ago

Yep, completely agreed, I just think the proposal isn't strengthened by limiting it to one domain of operators. People should use the language and notation that their domain is comfortable with. If yours is comfortable with APL then I think they should be able to use APL, and personally I'd be okay with including APL operators without the backticks, but it would be nice to have the option of using backticks with operators. With a text editor extension you could make it very easy to type out operators (it would automatically add backticks) and it could make the backticks less visible so they're not visually cluttering.

Korporal commented 2 years ago

Yes, I don't think it should be limited to just the APL symbols, that came up during the discussion and the title of the suggestion hasn't been updated. A broader set of symbols is now being suggested, this did come up but I need to scan the conversation to pinpoint that.

Today F# differentiates between identifiers and the "names" of operators, different alphabets are set aside for each so no collision is possible.

Allowing a backtick syntax for operator names could I guess be done but why would one want that? that would entail grammar changes in some way so that identifiers and operators could be recognized other than by their lexical structure.

We can use Greek characters today for F# identifiers (Greek is after all a human language!), so technically any arbitrary Unicode characters is either

  1. In the identifier alphabet
  2. In the operator alphabet
  3. Illegal - not available for any use.
CyrusNajmabadi commented 2 years ago

Allowing a backtick syntax for operator names could I guess be done but why would one want that?

To indicate it's a user defined operator. Similar to how backticks today allow anything to be an identifier, but clearly delineate what is going on.

vzarytovskii commented 2 years ago

To indicate it's a user defined operator.

You mean indicating more or less non-standard user-defined operators? Because now there's no way really to distinguish them (from just looking at them).

Korporal commented 2 years ago

Allowing a backtick syntax for operator names could I guess be done but why would one want that?

To indicate it's a user defined operator. Similar to how backticks today allow anything to be an identifier, but clearly delineate what is going on.

I suppose allowing an optional prefixing backtick could be useful in some situations where the author want's to convey that.

CyrusNajmabadi commented 2 years ago

I would just make it non-optional. Add this as new syntax that clearly indicates what is going on with minimal fuss. It would slot in naturally given how backticks already work in the language IMO.

Korporal commented 2 years ago

I would just make it non-optional. Add this as new syntax that clearly indicates what is going on with minimal fuss. It would slot in naturally given how backticks already work in the language IMO.

What impact do you think that would have on existing F# source code where user defined operators are already defined without any backtick?

CyrusNajmabadi commented 2 years ago

What impact do you think that would have on existing F# source code where user defined operators are already defined without any backtick?

That code woudl stay the same. It would only be new code that wanted to use characters outside of the normal operator char set that would use this.

Korporal commented 2 years ago

What impact do you think that would have on existing F# source code where user defined operators are already defined without any backtick?

That code woudl stay the same. It would only be new code that wanted to use characters outside of the normal operator char set that would use this.

Well this backtick idea wouldn't work for me. The initial suggestion was motivated by having the idea that notation is expressive, hence APL was used as an example. Forcing people to surround potentially single symbol operators with a pair of backticks defeats the purpose of the suggested change.

CyrusNajmabadi commented 2 years ago

I don't see how backticks make things any less expressive.

Forcing people to surround potentially single symbol operators with a pair of backticks defeats the purpose of the suggested change.

Why?

For example, f# forces you to wrap an operator in parens to take an infix operator and invoke it as a prefix. It also has wrapping of characters with backticks to indicate it's a name. It seems quite in line and just as expressive.

It may not be as terse. But i don't see terseness as an overall virtue. Indeed, it can be a very big net negative (perl et. al).

Korporal commented 2 years ago

I don't see how backticks make things any less expressive.

Forcing people to surround potentially single symbol operators with a pair of backticks defeats the purpose of the suggested change.

Why?

For example, f# forces you to wrap an operator in parens to take an infix operator and invoke it as a prefix. It also has wrapping of characters with backticks to indicate it's a name. It seems quite in line and just as expressive.

It may not be as terse. But i don't see terseness as an overall virtue. Indeed, it can be a very big net negative (perl et. al).

Consider this:

let geometricseries m n a r = (a * (1 - pown r ((n - m) + 1))) / (1 - r)

then contrast it with this:

let geometricseries m n a r = (a ``*`` (1 ``-`` pown r ((n ``-`` m) ``+`` 1))) ``/`` (1 ``-`` r)

This is how such expressions would have to be written if we were to extend the alphabet used for defining operators and we defined some single character operators as shown in some of the APL examples earlier.

We'd also introduce a non-uniformity too, in that operators named with the existing set of characters would not require the backticks but those defined with newer characters must have the backticks.

I - for one - do not see any merits in this. A more important issue I think is the one of precedence, how could that be handled with a vastly broadened set of operator characters to choose from.

CyrusNajmabadi commented 2 years ago

You wouldn't need backticks for normal tokens already recognized as operators.

Korporal commented 2 years ago

You wouldn't need backticks for normal tokens already recognized as operators.

I know, you already proposed that idea.

You would need them in an expression like this for example:

let geometricseries m n a r = (a ⍋ (1 ↓ pown r ((n ⍱ m) ⌈ 1))) ÷ (1 ○ r)

one would have to write that as:

let geometricseries m n a r = (a ``⍋`` (1 ``↓`` pown r ((n ``⍱`` m) ``⌈`` 1))) ``÷`` (1 ``○`` r)

I personally, see no merit in imposing that on users.

CyrusNajmabadi commented 2 years ago

That seems reasonable and readable to me. And it nicely calls out to me what these uncommon symbols are.

I'd also be ok with single backticks. But I'm not sure if that would be an issue for the language on some other fashion.

Overall, this seems very promising.

voronoipotato commented 2 years ago

I'd say it's "better than nothing" but I don't know that it's great that a single character operator gets expanded to 5. Definitely better than nothing though. I don't think this is terseness for its own sake, if you are using it as an operator the backticks quickly outnumber everything else. I'd personally take it over nothing, as I can write a vscode extension or font-ligature that makes the double backticks small and out of the way.

voronoipotato commented 2 years ago

I'm going to test today if I can jam a unicode symbol into a custom operation for a computation expression. If so, I'd say I'm okay with that as a workaround. That way weird custom syntax is cordoned off in a bounded context, and any rules we want can be used. Here's a kind of pseudo-code example of what it might look like.

type AplBuilder() = 
    [<CustomOperation("⍋",MaintainsVariableSpace=true)>]
    member x.GradeUp(source, [<ProjectionParameter> f]) = //TODO: write some code to grade up
voronoipotato commented 2 years ago

Backticked identifiers as operators might be up for consideration. From the compiler's perspective there's a fixed set of symbols you can use today, a backticked identifier is just an identifier, and you can't use identifiers as operators.

I think this can be considered as an enhancement to the language.

@cartermp This is a moon shot but, how would you feel about the idea of when backticked identifiers are defined as an operator they can be used without backticks when they are a single character in length. Otherwise they need to be backticked.

Being said if this CE approach works, I might just go that route. I do kinda think it is nice that you can go to definition and actually see the function name backing the symbol and that they can't be used ubiquitously.

Edit: It doesn't work. I can define it, but attempting to use the CustomOperation just causes an unexpected character in expression error.

dsyme commented 1 year ago

Closing as I labelled this "probably not" a while back :)