golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.24k stars 17.7k forks source link

proposal: spec: operator functions #27605

Open deanveloper opened 6 years ago

deanveloper commented 6 years ago

The only issue that I could find about operator overloading currently #19770, although it's currently closed and doesn't have many details.

Goal

Operator overloading should be used to create datatypes that represent things that already exist in Go. They should not represent anything else, and should ideally have no other function.

New operators should not be allowed to defined, we should only be able to define currently existing operators. If you look at languages that let you define your own operators (Scala, looking at you!) the code becomes extremely messy and hard to read. It almost requires an IDE because operator overloading is very heavily used.

Using an operator for anything other than it's purpose is a bad idea. We should not be making data structures with fancy looking + operators to add things to the structure. Overloading the + operator in Go should only be used for defining addition/concatenation operators for non-builtin types. Also, as per Go spec, binary operators should only be used for operating on two values of the same type.

Operators should only operate on and return a single type. This keeps things consistent with how operators currently work in Go. We shouldn't allow any type1 + string -> type1 stuff.

Operators should only be defined in the same package as the type they are defined on. Same rule as methods. You can't define methods for structs outside your package, and you shouldn't be able to do this with operators either.

And last but not least, operators should never mutate their operands. This should be a contract that should be listed in the Go spec. This makes operator functions predictable, which is how they should be.

Unary operators should not need to be overloaded.

+x                          is 0 + x
-x    negation              is 0 - x
^x    bitwise complement    is m ^ x  with m = "all bits set to 1" for unsigned x
                                      and  m = -1 for signed x

This part of the spec should always remain true, and should also remain true for anything using these operators. Perhaps ^x may need to have it's own, as there's no good way to define "all bits set to 1" for an arbitrary type, although defining a .Invert() function is no less readable IMO.

Unary operations on structs would then therefore be Type{} + t or Type{} - t, and pointers would be nil + t and nil - t. These may have to be special cases in the implementation of operator functions on pointers to types.

Assignment operators should also never be overloaded.

An assignment operation x op= y where op is a binary arithmetic operator is equivalent to x = x op (y) but evaluates x only once. The op= construct is a single token.

This should remain the same just as unary operators.

If we do not permit overloading the ^x unary operator, this means that we only need to define binary operations.

Issues/Projects aided by operator overloading

19787 - Decimal floating point (IEEE 754-2008)

26699 - Same proposal, more detail

19623 - Changing int to be arbitrary precision

9455 - Adding int128 and uint128

this code - Seriously it's gross
really anything that uses math/big that isn't micro-optimized If I went searching for longer, there'd probably be a few more that pop up

Syntax

What's a proposal without proposed syntaxes?

// A modular integer.
type Int struct {
    Val int
    Mod int
}

// ==============
// each of the functions would have the following function body:
//
//      if a == Int{} { // handle unary +
//      a.Mod = b.Mod
//  }
//
//  checkMod(a, b)
//  nextInt = Int{Val: a.Val + b.Val, Mod: a.Mod}
//  nextInt.Reduce()
//
//  return nextInt
//
// ==============

// In all of these, it would result in a compile error if the types of the arguments
// do not match each other and the return type.

// My new favorite. This makes for a simple grammar. It allows
// people who prefer function calls can instead use the `Add` function.
operator (Int + Int) Add
func Add(a, b Int) Int { ... }

// My old favorite. Abandoning the `func` construct clarifies
// that these should not be used like a standard function, and is much
// more clear that all arguments and the return type must be equal.
op(Int) (a + b) { ... }
operator(Int) (a + b) { ... }      // <- I like this better

// My old second favorite, although this looks a lot like a standard method definition.
// Maybe a good thing?
func (a + b Int) Int { ... }

// It can be fixed with adding an "op" to signify it's an operator function, although
// I do not like it because it just reads wrong. Also, looks like we're defining a package-level
// function named `op`... which is not what we are doing.
func op (a + b Int) Int { ... }

// Although at the same time, I don't like having words
// before the `func`... I feel that all function declarations should begin with `func`
op func (a + b Int) Int { ... }

// Another idea could just be to define a method named "Plus", although this
// would cause confusion between functions like `big.Int.Plus` vs `big.Int.Add`.
// We probably need to preserve `big.Int.Add` for microoptimization purposes.
func (a Int) Plus(b Int) Int { ... }

Considering other languages' implementations.

C++

// there's several ways to declare, but we'll use this one
Type operator+(Type a, Type b)

I think C++ isn't a bad language, but there are a lot of new programmers who use it and think it's "super cool" to implement operator functions for everything they make, including stuff like overloading the = operator (which I have seen before).

I also have a couple friends from college who really enjoyed defining operator functions for everything... no bueno.

It gives too much power to the programmer to do everything that they want to do. Doing this creates messy code.

Swift

static func +(a: Type, b: Type) -> Type

Note that custom operators may be defined, and you can define stuff like the operator's precedence. I have not looked much into how these operators end up being used, though.

C

public static Type operator+ (Type a, Type b)

Operator functions in C# end up being massively overused in my experience. People define all of the operators for all of their data structures. Might just be a consequence of using a language with many features, though.

Kotlin

operator fun plus(b: Type): Type // use "this" for left hand side

https://kotlinlang.org/docs/reference/operator-overloading.html, actually a really nice read.

Operator functions get used everywhere, and even the standard library is littered with them. Using them leads to unreadable code. For instance, what does mutableList + elem mean? Does it mutate mutableList? Does it return a new MutableList instance? No one knows without looking at the documentation.

Also, defining it as a method instead of a separate function just begs it to mutate this. We do not want to encourage this.

Open Questions

Which operators should be allowed to be overridden?

So, the mathematical operators +, -, *, /, % are pretty clear that if this gets implemented, we'd want to overload these operators.

List of remaining operators that should be considered for overloading:

Overloading equality may be a good thing. big.Int suffers because the only way to test equality is with a.Cmp(b) == 0 which is not readable at all.

I have left out || and && because they should be reserved exclusively for bool or types based on bool (has anyone ever even based a type on bool?) and see no reason to override them.

Should we even allow operator overloading on pointer types?

Allowing operator overloading on a pointer type means the possibility of mutating, which we do not want. On the other hand, allowing pointer types means less copying, especially for large structures such as matrices. This question would be resolved if the read only types proposal is accepted.

Disallowing pointer types
Allowing pointer types

Perhaps it should be a compile-time error to mutate a pointer in an operator function. If read-only types were added then we could require the parameters to be read-only.

Should it reference/dereference as needed?

Methods currently do this with their receivers. For instance:

// Because the receiver for `NewInt` is `*big.Int`,
// the second line is equivalent to `(&num).NewInt(....)`

var num big.Int
num.NewInt(5000000000000000000)

So should the same logic apply to operator functions?


I'm aware that this will probably not be added to Go 2, but I figured it would be a good thing to make an issue for, since the current issue for Operator Functions is quite small and, well, it's closed.


Edits:

OneOfOne commented 6 years ago

@gopherbot please add go2

bronze1man commented 6 years ago

I do not like this idea. Because:

deanveloper commented 6 years ago

It increase the complexity of syntax, compilers, tools that read the language.

Of course it does, it's a new feature. All features inevatibly will do this. Although if we're worrying about complicating anything, it should be the spec. Luckily a lot of the rules remain the same, the only things that change are allowing more things to be "operated" together with arithmetic operators.

It also increase the learning cost of the language.

Again, so does every new feature, and I'd say the learning cost is very small. People typically don't need to write operator functions anyway, as there are meant to be used much more than they are written.

It decrease the readable of the code. It is much more difficult to answer the question "where is the code defined?"

That is a valid critique, but ideally you should not be worried about what it does. It doesn't matter how 1+2 works, you care that it adds properly and you get a 3 in return. Operator functions should not do anything other than what you expect the operator to do.

I'd say it actually increases readability as well. Again, look at literally anything that uses math/big like this quick example. People who need stuff like math/big or a future decimal64 library can take advantage of this. Honestly, APIs for these things are quite terrible, and need to be because we need workarounds for these operators.

deanveloper commented 6 years ago

Also, it's defined the same place that methods are. In the same package as the type itself! So really the question "where is it defined" is a question that we already have the tools to answer.

urandom commented 6 years ago

It also increase the learning cost of the language.

I would argue the opposite. The current implementation of the math/big package is confusing and very hard to use. A very big improvement would be if it would just support operators and would act like the base number types. That would make it very easy to learn and use.

It decrease the readable of the code. It is much more difficult to answer the question "where is the code defined?"

The code readability is greatly improved. Again, math/big clearly illustrates that. As for "where is the code defined", that has little to do with readability, and the answer, as is right now, is of course: https://godoc.org/math/big

urandom commented 6 years ago

Another area that may be improved with such operator functions is the current generics proposal. As it stands right now, writing a function to add 2 numbers would only be limited to the current numeric types. A hypothetical Sum generic function will never be able to work with big.Int, for example, or say, a custom rational number type.

davecheney commented 6 years ago

I disagree. Only a small percentage of the math/big api can be replaced with the small set of binary operators defined in the language.

On 11 Sep 2018, at 17:47, Viktor Kojouharov notifications@github.com wrote:

It also increase the learning cost of the language.

I would argue the opposite. The current implementation of the math/big package is confusing and very hard to use. A very big improvement would be if it would just support operators and would act like the base number types. That would make it very easy to learn and use.

It decrease the readable of the code. It is much more difficult to answer the question "where is the code defined?"

The code readability is greatly improved. Again, math/big clearly illustrates that. As for "where is the code defined", that has little to do with readability, and the answer, as is right now, is of course: https://godoc.org/math/big

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

as commented 6 years ago
// each of the functions would have the following function body:
{
    return new(big.Int).Add(a, b)
}

I would prefer not to allocate a potentially large amount of memory for every arithmetic operation, especially if that operation looks like a normal arithmetic operation.

The package math/big is both well-designed and has a learning curve. The two qualifiers are conflated in some of the examples and comments here. The stated goal of this package was to provide efficient manipulation of big numbers. The current proposal should perform no worse than the current state of affairs.

deanveloper commented 6 years ago

I disagree. Only a small percentage of the math/big api can be replaced with the small set of binary operators defined in the language.

Yes, only a small percentage can be replaced, but what about how often those are used? (Also, this really isn't only for math/big, I mainly used it as an example because it has a very steep learning curve, and the code resulting from it is typically very messy and can be hard to maintain.

I would prefer not to allocate a potentially large amount of memory for every arithmetic operation, especially if that operation looks like a normal arithmetic operation.

The package math/big is both well-designed and has a learning curve. The two qualifiers are conflated in some of the examples and comments here. The stated goal of this package was to provide efficient manipulation of big numbers. The current proposal should perform no worse than the current state of affairs.

This is a good critique as well, but when all you want to do is to add numbers without having to worry about overflow, it could be extremely useful.

There could also be a separate type defined somewhere else with a less steep learning curve. After all, there is a pretty popular issue (#19623) to make int arbitrary precision. This could instead be a third party library (or in golang.org/x/) with operator functions defined for it.

deanveloper commented 6 years ago

Just as a quick demo as to how this could be implemented outside of math/big, which was a bad example (since it's designed to be efficient).

This could also of course be used for other mathematical constructs such as matrices, vectors, etc. Also for arbitrary-precision integers, a bigdecimals, a elliptical-point, etc.

A simple modular integer implementation:

// A modular integer.
//
// `Reduce` should be called if you ever modify `Val` or `Mod` directly.
// As long as this is done, the `==` operator will work properly.
type Int struct {
    Val int
    Mod int
}

// Adds two modular integers and reduces.
// Panics if a and b do not have the same modulus
operator(Int) (a + b) {
    if a == Int{} { // handle unary +
        a.Mod = b.Mod
    }

    checkMod(a, b)
    nextInt = Int{Val: a.Val + b.Val, Mod: a.Mod}
    nextInt.Reduce()

    return nextInt
}

// Subtracts a modular integer from another and reduces.
// Panics if a and b do not have the same modulus.
operator(Int) (a - b) {
    if a == Int{} { // handle unary -
        a.Mod = b.Mod
    }

    checkMod(a, b)
    nextInt = Int{Val: a.Val - b.Val, Mod: a.Mod}
    nextInt.Reduce()

    return nextInt
}

// Multiplies two modular integers and reduces.
// Panics if a and b do not have the same modulus
operator(Int) (a * b) {
    checkMod(a, b)
    nextInt = Int{Val: a.Val * b.Val, Mod: a.Mod}
    nextInt.Reduce()

    return nextInt
}

// sets i.Val to its least nonnegative residue
func (i *Int) Reduce() {
    if i.Val >= 0 && i.Val < i.Mod {
        return
    }
    i.Val = (i.Val%i.Mod + i.Mod)%i.Mod
}

// makes sure a and b have the same modulus.
// if not, it panics.
func checkMod(a, b Int) {
    if a.Mod != b.Mod {
        panic(fmt.Errorf("mod is not equal: a(%d) b(%d)", a.Mod, b.Mod))
    }
}

And of course the usage:

func main() {
    nine := modmath.Int{ Val: 9, Mod: 2 }
    ten := modmath.Int{ Val: 10, Mod: 2 }
    twentyOne := modmath.Int{ Val: 21, Mod: 2 }.Reduce()

    fmt.Printf("9 + 10 == 21 (mod 2):", x + y == twentyOne)
    // output: true
}
go-li commented 6 years ago

With operators there are 5 goals:

  1. Disallow operating on different types
  2. Not wildly exported to other packages, force re-declaring
  3. Disallow mutating input args
  4. Interact with generics well
  5. Only one function that returns integer for operators <,>,<=,>=,==,!= negative positive and equal (zero)

I propose the following. Operators should be a function calls with functions that are in general:

type T struct{}
func MyCustomAdd(*T , *T) *T {...}
func MyCustomSub(*T , *T) *T {...}
func MyCustomMul(*T , *T) *T {...}
func MyCustomMod(*T , *T) *T {...}
func MyCustomShr(*T , uint) *T {...}
func MyCustomShl(*T , uint) *T {...}
func MyCustomBwAnd(*T , *T) *T {...}
func MyCustomBwOr(*T , *T) *T {...}
func MyCustomBwClr(*T , *T) *T {...}
func MyCustomCompare(*T , *T) int {...}

Comparison is special, it is the "default operator", and maps all of the operators <,>,<=,>=,==,!=

Comparison is written like:

op default MyCustomCompare

Next the user can define also the other operators:

op + MyCustomAdd

If MyCustomAdd does not match the generic signature func ( * , * ) (*) this would be a compile time error. This satisfies point 1. Also, defining operator on a type that already has it would be a compile time error.

If the function is capitalized then the operator can be exported but must be also declared in the remote package, like so:

op + otherpackage.MyCustomAdd

This satisfies point 2.

All functions linked to operator will go thru additional type checking, to detect cases where you write to args and where you return something else than pointer to your own variable (this rule bans returning nil, also bans returning a and b):

func MyCustomAdd(a *MyInt ,b *MyInt) *MyInt {
    var foo MyInt // this must exist
    a.x = 7 // cannot write to arg
    if (yadda yadda) {
        return nil // this cannot exist
    } else {
       return b // this cannot exist
    }
   return &foo // this must exist
   return //this cannot exist
}

Operators that passed the additional type checking, will be internally optimized to something like

func(a *, b *,ret *)

Those who did not won't.

Next, generic function Maximum:

func Maximum (compare func (*, *)int, ... things) (out *) {...}

Can be specially written as:

func Maximum2 (op default, ... things) (out *) {...}

Now comes the main point: Maximum requires either plain generic function or auto-passed "unoptimized operator". Maximum2 requires optimized operator. If you pass optimized operator to function that requires unoptimized one, a wrapper code will be generated and the wrapper passed.

Operators can be auto passed, like this:

Maximum(_, T{},T{},T{},T{} )
Maximum2(_, T{},T{},T{},T{} )

And used inside the generic function:

func Maximum2 (op default, things ... ) (out *) {
   if things[1] > things[0] {
          return &things[1]
    }
    return &things[0]
}

And the best part: compiler takes care of the boring details. Like: promoting values to pointers when passing to operator functions. Creating wrappers for operators. Typechecking operator functions and returning compile time errors when you pass unoptimizable operator function to a generic function that has op param.

deanveloper commented 6 years ago

Could you edit to have your examples in code fences?

like this

would be ``` like this ```

maj-o commented 6 years ago

Maybe a simpler syntax is also okay?

func (a mytype) +(b mytype) (mytype) {
     return a.Add(b)
}

Just like a normal methods, but with an operator rune as name.

go-li commented 6 years ago

To make it sure that operators always return something, and are optimized (by writing it's result to a specific memory location using input argument as output), I revise my proposal to:

type T struct{}
func MyCustomAdd(*T , *T, *T) *T {...}
func MyCustomSub(*T , *T, *T) *T {...}
func MyCustomMul(*T , *T, *T) *T {...}
func MyCustomMod(*T , *T, *T) *T {...}
func MyCustomShr(*T , uint, *T) *T {...}
func MyCustomShl(*T , uint, *T) *T {...}
func MyCustomBwAnd(*T , *T, *T) *T {...}
func MyCustomBwOr(*T , *T, *T) *T {...}
func MyCustomBwClr(*T , *T, *T) *T {...}
func MyCustomCompare(*T , *T) int {...}

If operator is passed to a generic function, the operator function must return it's third argument: Example:

func DoAddition(op +, operand1 *, operand2 *, result *) (*) {
    *result = operand1 + operand2
    return result
}

type T struct {x int}
func AddCorrect(a *T, b *T, ret *T) (ret2 *T) {
    (*ret).x = (*a).x + (*b).x
    return ret
}

func AddIncorrect(a *T, b *T, ret *T) (ret2 *T) {
    r := T{(*a).x + (*b).x}
    return &r
}

this will work:

op + AddCorrect
DoAddition(_, &T{}, &T{}, &T{})
// okay

this won't:

op + AddIncorrect
DoAddition(_, &T{}, &T{}, &T{})
// COMPILE TIME ERROR: Operator + mapped to AddIncorrect on line 54,
//does not return it's third argument on line 87774,
//when passed to generic function DoAddition on line 45645
deanveloper commented 6 years ago

Okay, I think you're micro-optimizing a bit here. The syntax should be simple.

Also, as a minor point, you don't need to dereference before using the . operator. The compiler does that for you since it's part of the spec.

I'd also like to add that read-only types (I'd link the proposal if I weren't on my phone) would work well with this proposal, as operator functions could require that all inputs are read-only.

deanveloper commented 6 years ago

I think forcing redeclaration of operators may lead to an "operator poisoning" of sorts... It would be especially inconvenient for people who use Go for mathematics to use a matrix, vector, and arbitrary-size int, and need to redeclare 50 operators.

Calling the comparison operators "default" doesn't make any sense to me.

I like the rest of the first part of your proposal, though. The second part not-so-much.

go-li commented 6 years ago

I am not merely speculating, I have a very serious intention to add operator overloading to my experimental language, named go-li. In fact I plan to do it next week, exactly how I propose.

Comparisons: Calling them default is wrong you're right. IMHO Most people would want to map all of them to the same function, in the C language tradition that returns negative int when a<b, zero when a==b, and positive int when a>b.

take look at this line:

func Foo(op comparison) {...}

How do you know that it is operator comparison, rather than parameter named op of type comparison? You don't know. Backwards compatibility is not 100% bulletproof. For this reason at least one of the words has to be reserved keyword, I propose:

func Foo(op default) {...}

or something like this:

op <=> MyComparison
func Foo(op <=>) {...}

or, alternatively, not using "op" but the "map" keyword to map operators.

map + MyAddition
func Foo(map +, map -, map comparison) {...}

alternatively, not using any arg names:

map + MyAddition
map <=> MyComparison
func Foo( +,  -, <=>, actual_parameter int) {...}
go-li commented 6 years ago

To prove it, here is example of code that uses generics, and would benefit from operator overloading. Fenwick tree (a data-structure that allows you to quickly sum a continuous sub-sequence of a large array.). Fenwick tree in go-li language Fenwick tree in folang Fenwick tree example on folang playground It needs the addition operator and negation operator , currently implemented using those callback functions.

The thing you probably disliked, is how we pass operators to generic functions. Trust me this is needed, otherwise people who don't prefer operators will have no way to call into generic functions that use operators. In this way they can, by passing the required concrete function that matches the callback signature as the parameter the way it is.

If we don't do this, the use of operators would spread like a disease: i need package A package A uses operators therefore I define operators in my package so A can use them since i defined operators i start using them (why not? they're already here) suddenly my code is full of operators other people need my package therefore they define operators ... and so on

deanveloper commented 6 years ago

I'm 100% okay with binding operators to existing function calls. What I don't like is redeclaring the operators in every package that's going to use them.

I really would like a new "op" or "operator" TLD. It would make sure source code stays readable. It could still be backward-compatible as it is a TLD and would only be used as one.

I think the best syntax would be something like

op + CustomAdd

func CustomAdd(*T, *T) *T { ... }

The only issue is that this would allow

op + func (*T, *T) *T { ... }

Sure, you could say "just don't allow inlining here", but I feel like that'd be against Go's simplistic design. Perhaps instead preventing that should just be a go vet check.

c69 commented 5 years ago

TL;DR Dear "go community", please read how its implemented in C++, then come back to this proposal. Also, you will probably need generics. And this is not trolling. @go-li has actually already brought up many valid points from that ecosystem in this thread.

Long list of comments:

creker commented 5 years ago

I also don't like the idea of re-declaration. If I import a package that defines operators on its types then I don't have access to those operators? That kinda makes them useless. The whole point of operator overloading is to give them to library consumers. I, as a library implementer, don't care about them and don't really need them.

Also, I don't think we should look at C++. It has too much complexity that's very specific to C++ - value semantics, rvalue/lvalue, move semantics etc etc etc. It looks, frankly, ridiculous how it works rights now and committee understands that. <=> tries to make it slightly better. There're better examples on how it can be done.

maj-o commented 5 years ago

Operator declaration without overloading is very useful. With this feature decimals can be made usable, complex numbers can be took out of core, even thinking of a flexible number type gets live. For me the absolutly useless decimal type today is the reason for woting for operator methods. Operator overloading is no must for me, cause I define a new type when I want to overload, but declaring a new type meens also I can declare new operator methods. Overloading is useful for languages like C++, but (in my opinion) not for Go.

ghost commented 5 years ago
// Barycentric returns the barycentric coordinates for a point in a triangle.
// The barycentric coordinates cna be used for interpolation of {a, b, c} to
// point p (such as normals, texture coordinates, colors, etc)
func Barycentric(a, b, c, p *glm.Vec3) (u, v, w float32) {
    var v0, v1, v2 glm.Vec3
    v0.Sub(b, a)
    v1.Sub(c, a)
    v2.Sub(p, a)
    d00 := v1.Dot(&v1)
    d01 := v0.Dot(&v1)
    d11 := v1.Dot(&v1)
    d20 := v2.Dot(&v0)
    d21 := v2.Dot(&v1)

    denom := 1 / (d00*d11 - d01*d01)
    v = (d11*d20 - d01*d21) * denom
    w = (d00*d21 - d01*d20) * denom
    u = 1 - v - w
    return
}

^ disgusting, error prone, entirely because I can't define +* on vectors

All geometry/physics libraries are full of unreadable code because this feature is missing.

as commented 5 years ago

@hydroflame Your example contains a lot of code with little context. Can you provide the example of imagined "non-disgusting, error minimizing" code where you are using the notation?

StephanVerbeeck commented 5 years ago

I would like to add some new insights to this topic:

Proposed method for function overloading is via function names that SHOW both the operator and the position of the arguments.

Example 1:
func "_+_"(a,b int) (sum int) {
    sum = a + b 
    return
}

Example 2:
func "_++"(a *int) (result int) {
    result = *a
    *a = result+1
    return
}

Example 3:
func "++_"(a *int) (result int) {
    result = *a + 1
    *a = result
    return
}

Example 4:
func "_[_]=_"(arr map[int]string, idx string, value string){
    ...
}

Example 5:
func "_._"(arr map[string]int, value string) int {
    return arr[value]
}

As you see this opens much wider possibilities than old-style operator overloading but (and I have implemented this in an compiler/bytecode-interpreter language so I know) this is bothersome and if I would face it again I would not go for operator overloading at all.

The only thing why you want to do this is that GO currently is lacking some of the more excotic data containers but francly. I do not think that GO needs this. It would change the language profoundly in a manner that makes it less distinct and less systematic (readable/predictable).

It also must be implemented on a low level in the parser and compiler so this change would be affect a large part of the code-base of the language.

deanveloper commented 5 years ago

I reflect on other languages to learn from them by their mistakes. We should avoid the _[_] = _ operator and such similar operators which do not reflect arithmetic types - this proposal is strictly for creating a way to represent arithmetic types in Go. The syntax you provide is intriguing though. I may draft a new proposal in the near future which is less about overloading operators. It's hard to explain in a quick comment but hopefully it should still be simple.

cosmos72 commented 5 years ago

There is an important but implicit (and probably slightly hidden) trade-off in operator overloading, which is visible if you compare the proposed functions/methods signatures with the methods of math/big:

TL;DR:

it is not possible to pass among the arguments a value or pointer where the result will be stored.

This will cause significant slowdown on large objects as arbitrary precision numbers (math/big.Int, math/big.Float, math/big.Rat), arbitrary size matrices (gonum.org/v1/gonum/mat/Dense) etc.

Full explanation:

Let's consider for example the addition between user-defined types: the syntax is not very relevant, it can be any of the existing proposals:

operator (*big.Int + *big.Int) Add
func Add(a, b *big.Int) *big.Int {
 // must allocate a new big.Int and return its address
}

or

operator(*big.Int) (a + b) {
  // idem, must allocate a new big.Int and return its address
}

or anything else.

In all cases, as highlighted in the comments above, the operator implementation must allocate a (possibly large) object where the result is stored. I used pointers *big.Int instead of values to make the issue more obvious.

On the other hand, math/big methods solve this issue by accepting an additional argument where they store the result - currently the receiver:

package big

// Add sets z to the sum x+y and returns z. 
func (z *Int) Add(x, y *Int) *Int {
  // store the result in z:
  // if its internal buffer is large enough, no allocation is needed.
}

Now, one may argue that the approach taken by math/big is a micro-optimization. I am not really interested in discussing whether it's a "micro" optimization or not, but I think it's a very important optimization:

Imagine, instead of integers or floats, a library for linear algebra on arbitrary size matrices - for example gonum.org/v1/gonum/mat: if operator overloading is part of Go, authors and users of such library will be tempted to support it. At the moment, they have methods similar in spirit to math/big, for example:

package mat

// Add adds a and b element-wise, placing the result in the receiver
func (m *Dense) Add(a, b Matrix) {
  // store the result in m: also here, if its internal buffer
  // is large enough, no allocation is needed.
}

but operator overloading will not have the third argument, forcing to allocate it:

operator (Matrix + Matrix) Add
func Add(a, b Matrix) *Dense {
  // must allocate a new Dense and return its address
}

If you then write something like a + b + c + d on matrices, each + will allocate its result, which will need to be garbage collected at the end of the expression (except for the last one) - I expect it will be much slower than the (micro?) optimized

var *Dense a, b, c, d, z
// initialize a, b, c, d and z. Then:
z.Add(a, b)
z.Add(z, c)
z.Add(z, d)

So my question is about the following trade-off:

The (hopefully) improved readability and simplicity of operator overloading (if used correctly and not abused) is worth the slowdown it causes due to multiple allocations and subsequent garbage collection?

This happens if the the arguments of overloaded operators are values with internally allocated (and possibly large) buffers: for example arbitrary precision numbers (math/big.Int, math/big.Float, math/big.Rat), arbitrary size matrices (gonum.org/v1/gonum/mat/Dense) etc.

P.S. one may attempt to solve this issue by adding an extra parameter also to overloaded operators, as for example:

operator (*big.Int + *big.Int) Add
func (z *big.Int) Add(a, b *big.Int) *big.Int {
  // fill z with the result of a + b and return z
}

but it does not work: it only moves the problem from method declaration to method use, because when invoking such operator+ in many cases there is no receiver:

var a, b, c, z *big.Int
// initialize a, b, c, d and z. Then:

z = a + b // a clever compiler may use z as receiver of operator+

z = a + b + c // z can be the receiver of the second addition.
                    // but who is the receiver of the first addition, i.e. a + b ?
ianlancetaylor commented 5 years ago

@cosmos72 You make an excellent point, but I note that that compiler can convert z = a + b + c into z = a + b; z = z + c. Any expression no matter how complex is just a sequence of unary and binary operations. So the question is whether and when the compiler can reliably turn

t1 = a + b
z = t1 + c

into

z = a + b
z = z + c

That is, when is it safe to replace a compiler-introduced temporary variable with the final result of the expression, and, similarly, when it is safe to combine compiler-introduced temporaries? It's a problem not dissimilar to register allocation with an unbounded number of registers, although it's possible that there are cases where there is some additional aliasing that must be considered.

Anyhow, I think the main point is, if we have operator functions, when writing the implementation there needs to be a way to express not just the operands but also the destination aka receiver.

cosmos72 commented 5 years ago

Thanks @ianlancetaylor

Sorry for the second long post in a row - the analysis of "compiler-introduced temporary variables" and "aliasing" contains some subtle points and, if operator overloading is added to the language, they are going to affect Go programmers and compilers writers for a long time.

It is thus important to fully understand the consequences in detail, as they are non-trivial - I hope you agree it's the only way to make an informed decision.

Let's start from a concrete example of operator overloading:

package main
import (
  "math/big"
)
var _2 = big.NewRat(2, 1)

// compute z = ax^2 + 2bxy + cy^2 + 2dx + 2fy + g
func QuadraticCurve(z, x, y, a, b, c, d, f, g *big.Rat) *big.Rat {
  z = a*x*x + _2*b*y + c*y*y + _2*d*x + _2*f*y + g
  return z
}

The implementation surely looks enticingly similar to the mathematical formula.

Without operator overloading, the equivalent code is ugly, difficult to read and requires some thought to write:

// compute z = ax^2 + 2bxy + cy^2 + 2dx + 2fy + g
func QuadraticCurve(z, x, y, a, b, c, d, f, g *big.Rat) *big.Rat {
  var t big.Rat

  z.Mul(a, x).Mul(z, x)               // z = a*x*x

  t.Mul(_2, b).Mul(&t, x).Mul(&t, y)  // t = 2*b*x*y
  z.Add(z, &t)                        // we can now reuse t

  t.Mul(c, y).Mul(&t, y)              // t = c*y*y
  z.Add(z, &t)                        // we can now reuse t

  t.Mul(_2, d).Mul(&t, x)             // t = 2*d*x
  z.Add(z, &t)                        // we can now reuse t

  t.Mul(_2, f).Mul(&t, y)             // t = 2*f*y
  z.Add(z, &t)                        // we can now reuse t

  return z.Add(z, g)
}

There is no need to highlight the advantages of the first version and the disadvantages of the second. I will just say that the first seems a high-level language, while the second has an assembler-like look, where variables take the place of CPU registers and are managed explicitly one by one - including temporary variables.

The disadvantages of the first version can be summarized in a question: can the compiler be sufficiently smart that both compiled versions run equally fast? (without artificially penalizing the second, obviously)

Let's see what a sufficiently smart compiler should do.

  1. compute how many temporary variables are needed.

This is somewhat similar to register allocation, as you pointed out. It can be visualized as a tree coloring problem, where the tree is the Abstract Syntax Tree (AST):

      z
      =
      +
     / \________
    *           +
   / \         / \________
  a   *       *           +
     / \     / \         / \________
    x   x   2   *       *           +
               / \     / \         / \________
              b   *   c   *       *           +
                 / \     / \     / \         / \
                x   y   y   y   2   *       *   g
                                   / \     / \
                                  d   x   2   *
                                             / \
                                            f   y

The tree coloring problem can be stated as follows:

Also, the compiler cannot take advantage of any mathematical property of the operators, such as neutral element e op x == x, commutativity a * b == b * a, associativity (a * b) * c == a * (b * c) and distributivity (a + b) * c == a * c + b * c because the user-defined operators may not have such properties. For example, matrix or quaternion multiplication is not commutative and octonion multiplication is neither commutative nor associative.

In this case, the AST is quite simple and needs only two colors, i.e. two temporary variables:

  1. analyze aliasing

What's aliasing actually? One can attempt to define it as follows: if two values are aliased, they may reference the same mutable data. Thus modifying the content of one value may modify the other.

In our example, the "sufficiently smart" compiler should realize that z is only used to store the final result, and thus, if it demonstrates that z is not aliased to any other variable, it can directly use z as one of the temporary variables. This is a tricky point, both for programmers and for the compiler:

if z is aliased to some other argument passed to QuadraticCurve(), the second version of the function will not behave as expected, i.e. it will produce a different result.

Maybe it's a bug, maybe it's intentional (and hopefully documented), but in any case there is no way to inform the compiler of such peculiarity. In other words, there is currently no Go syntax to tell the compiler which variables may alias each other and which variables are guaranteed not to alias each other.

As a side note, C (not C++) has the restrict keyword to annotate a pointer as "it will not alias anything else", but a single mistake using it will introduce subtle and hard-to-find bugs. Not something I would recommend for Go.

In Go, analyzing aliasing at compile time can probably be analogous to escape analysis: when a function is compiled, its callers may not be known yet, so the compiler cannot assume anything about the aliasing among its arguments, but it can analyze the aliasing created or destroyed by the function body.\ When local variables are created, they are not aliased to anything: aliasing can only be introduced by storing shared data inside them, or by storing the same slice, map or address into multiple places. (Note: what about closures?)

In short, aliases analysis will probably be able to produce a proof "these two variables are not aliases" only for local variables, and only if the compiler performs an aliasing analysis on each function while compiling it, similarly to how it currently performs escape analysis.

  1. actually creating "compiler-introduced temporary variables"

There is currently no uniform syntax for this.

Many types guarantee in their documentation (which a compiler cannot read) that the zero value is valid and useful, but other types give no such guarantee.

Also, in many cases operator overloading will accept pointers to named types, as the *big.Int used in the example above, and the "compiler-introduced temporary variables" of type *big.Int must be a non-nil pointer to big.Int.

In general, if the compiler needs to introduce a temporary variable of type *T, should it call new(T)? What happens if the zero value of T is not the right thing to use?

The solution I am currently experimenting with (for generics, which have a similar need for "user-introduced temporary variables") is the following:

This is a way to introduce C++-style constructors in Go - New() becomes equivalent to a zero-argument constructor, but I could not find a better way.

And it certainly has the disadvantage that Go code using operator overloading becomes harder to reason about: the compiler will invisibly insert calls to T.New()

maj-o commented 5 years ago

@cosmos72 much work, chapou.

But the main point is just to have operator methods, like introduced by Mr. Griesemer in 2016. This is absolutely needed for a usable decimal implementation or other useful number formats. And for minimizing the core of Go. For example complex numbers could be implemented in a separate package.

Overloading is not needed or wanted.

That this works, can be seen here https://youtu.be/vLxX3yZmw5Q and here https://github.com/griesemer/dotGo2016?files=1 Maybe I missed the point. For me it seems quiet simple. Here is the proposal and the solution. Motivation is business calculations: invoices, taxes, ... This is not possible without a good decimal implementation with usual operators.

cosmos72 commented 5 years ago

@maj-o I was not discussing the syntax to define operator on new types - it can be operator overloading, operator methods, annotations, or any other proposal.

My work above analyzes the amount of garbage (temporary values) produced by operators on user-defined types, the techniques to minimize the garbage, and the limits of such techniques.

The reason is simple: less garbage means fewer allocations and fewer garbage collections, which translates to faster execution (provided the algorithm is not changed in other ways)

The only user-visible part of my analysis is that functions/methods that implement operators should accept an additional parameter, where they store the result before returning it. The Add, Sub, Mul, Div ... methods of types inside math/big do exactly that. Note: this has no visible effect on how operators are used.

frogeyedpeas commented 5 years ago

i would be a healthier person if "math/big" had operator overloading plz do soon :)

maj-o commented 5 years ago

@deanveloper: oh yes, this (#35307) takes the wind out cosmos72 long posts. if simd, misd ... would work, then the compiler could perfectly optimize

a+b+c

of any type. That would be much better then internal variables or how old peaple would solve this problem stack pushes and pops.

cosmos72 commented 5 years ago

My (very) long posts were meant to analyze the performance implications of operator functions on large and variable-size objects, as for example math/big.Int.

I perfectly agree that operator functions on fixed-size SIMD data (AVX/NEON etc.) would be very useful and do not need all that machinery - especially because they can (and should) be stored in CPU registers as much as possible

vwxyzjn commented 4 years ago

@bronze1man Where is the code defined is also very important to me. How about a solution of prefixing the operators with _. So something like

a _+ b _* c. And if the user ctrl + click on _+, the IDE could take the user to the code that overloads the operator? And the fact that such prefix would inform the users of potential overhead of using the such operator overloading.

maj-o commented 4 years ago

@vwxyzjn Maybe I missed something - but it seams more difficult. Overload - How can You overload an interface? Go is "method-orientated", a Reader is any instance of a type (mostly struct) with an implementation of the Reader interface. This is great. If You "derive" this "class" you have to implement its interface - this is the Go way - and this is good.

Here comes the proposal in - we can not do this for build in types, 'cause they have an interface including operators (infix-methods), which can not be declared in Go - but being able to do this, would be great.

In far future, I'd like to write

var i int = 1
var d double = 1.1
var res1 int  = i + d
var res2 double = i + d
var res3 int = d + i
var res4 double = d + i

// or - this would be a dream >>

var resDream decimal = d * d

//  'cause this would mean, we can work with any numbers in any type - getting the right result (either fast or correct)

with res1 and res3 == 2 and res2 and res4 == 2.1 and resDream 1.21 (decimal)

For correctly doing this both int and double must implement the interface for + (for int and double - maybe some generics could make this simpler ;-) )

Do not get me wrong - but Infix-Methods _+ seam to make things more difficult and once again overloading is not the target.

soypat commented 4 years ago

I'm kind of a newbie to Go but it ocurred to me if we already have the interface block, would it not make sense to define operators in there since interfaces define behaviour?

deanveloper commented 4 years ago

i’m not sure how you would go about doing it with an interface, since interfaces only describe already existing structures.

ncruces commented 4 years ago

It think the definition of unary operators in the OP is a can of worms. For instance, taken literally it seams it opens up +"abc" to mean same as ""+"abc".

Ostensively this is because string concatenation is fundamentally different from mathematical addition (and it could be argued they shouldn't share an operation), but such is life.

IMO, while I appreciate the attempts at simplification, trying to ensure operators are only applied to "math", and making a minimal set of sane assumptions about their usage, to ensure we don't end up with C++, it should be noted that even many typical "math" notations will go against some of the more restrictive rules, because some algebras have different ideas about "zero", "unit", etc.

For example, unifying comparisons is a nice effort, but it should not be lost that many types will might have defined equality, but not a defined total order.

For another, the SO3 space of 3D rotations, typically represented by unit quaternions usually uses binary * for composition, and unary - for the conjugate/inverse (which as "wrong" as string concatenarion, and similarly violates the OP rules by defining unary - but not binary -).

Trying to enforce what essentially boils down to good taste is going to be hard.

Having said that: fully agree with the restriction that operands must have the same type (in line with Go), and attempts at ensuring value semantics (even if at the cost of some performance).

Utilski commented 4 years ago

I'm fairly new to go, and from what i saw i like this idea i was reading through here and thought maybe changing this:

operator (Int + Int) Add
func Add(a, b Int) Int { ... }

to this:

operator (Int + Int) Add
func (a int) Add(b Int) Int { ... }

using the first operand in the func definition as a receiver, i don't have much to argue for it other then it looks better to me personally

deanveloper commented 4 years ago

That would actually be possible to write with the initial syntax using a neat trick!

operator (Int + Int) Int.Add
func (a Int) Add(b Int) Int { ... }

The Int.Add syntax takes what we would traditionally call func (a Int) Add(b Int) and instead evaluates it as func Add(a Int, b Int). See method expressions

k06a commented 3 years ago

Any update on this?

ianlancetaylor commented 3 years ago

@k06a No updates.

The next big language change is going to be generics. If the generics proposal is adopted and implemented, I doubt we will make another big language for some time. So honestly I would not anticipate a change along these lines happening for some years (if ever).

eacp commented 3 years ago

@urandom

I don’t think this would be good for the language because it would be abused. However, just like we have the + operator for strings, the go team could also add operators for math/big and still restrict it from the language. Maps used some sort of generic syntax without adding generics to the language for about 10 years. I am sure the go team at google can add special operators to math/big or the set package without messing up the whole language. I think @rsc and @robpike could have more insight that I have

eacp commented 3 years ago

@c69 matrices are a way bigger problem than conceptual math types such as currency, strings or sets because not all matrices can be summed or multiplied.

Let m and n be different matrices. For m + n to be defined, m and n must have the same dimensions, and for m * n to be possible, the inner dimensions of m and n must match. How would that be enforced in the language?

However, sets are a very good use case, but just like maps, an exception could be made provided a set package (with generics) is added to the library.

Zaya6 commented 3 years ago

Why not just declare it in the function's export comment? I'm pretty new to golang too but one thing I admire is how it makes comments an important part of the coding process. Something as simple as

// Add defines an addition operator for datatype 
//go:operator (Int + Int) Add
func (a int) Add(b Int) Int { ... }

It seems like most of the problems people have with this proposal are the effects this has on gc or abuse. Both can be mitigated by just making sure the user is informed by forcing documentation.

I really hope this makes it into the language since it just make sense for us that do mathematical operations on matrices or vectors daily. It's about readability for me. Types with mathematical operations should follow mathematical syntax whenever possible so hand written arithmetic can better translate. This:

m1 := Matrix4x4{...}
m2 := Matrix4x4{...}
m3 := Matrix4x4{...}
m4 := m1.Add(m2.Mult(m3))

is just not as clear or readable as this:

m1 := Matrix4x4{...}
m2 := Matrix4x4{...}
m3 := Matrix4x4{...}
m4 := m1 + m2 * m3
Efruit commented 3 years ago

I think @vwxyzjn's idea is solid, but would be more appropriate (and ergonomically friendly for those of us using QWERTY keyboards) if the prefix were . instead, partly for the reason that it helps clarify that it's a (slightly disguised) method call, and doesn't conflict with existing syntax when spaces are omitted.

Proposal

Operators may be optionally prefixed with .. For binary operators, this permits calling an identical method from the lhs(left-hand side), with the rhs (right-hand side) passed as an argument. Unary operators (obviously) call the method from the rhs. For all intents and purposes, these are normal method calls and definitions, and have no special requirements imposed upon them. They may accept any type a method accepts, and return any type a method may return.

Example:

package main
import "fmt"

type MyInt int
func (lhs MyInt) +(rhs MyInt) MyInt {
    return lhs+rhs
}

func main() {
    fmt.Println(MyInt(1) .+ MyInt(2)) // Prints 3
}

The key here is that the + operator is not overloaded by the normal definition, but supplemented with syntax that is reminiscent of a method call, because it is a method call.

Translated to current Go, the above example would be:

package main
import "fmt"

type MyInt int
func (lhs MyInt) Plus(rhs MyInt) MyInt {
        return lhs+rhs
}

func main() {
        fmt.Println(MyInt(1).Plus(MyInt(2)))
}

Overloaded operators would follow the exact same syntax rules as their normal counterparts, and would simply expand to rhs.Operator(lhs). I would like to think this could work well with the pending generics proposals, and involves relatively minimal syntax changes.

The language definition would require the following changes:

binary_op = ["."] ("||" | "&&" | rel_op | add_op | mul_op) .
unary_op  = ["."] ("+" | "-" | "!" | "^" | "*" | "&" | "<-") .

MethodName = identifier | binary_op | unary_op .

Commentary

I disagree with the original proposal, in that we should not assume what users need with regards to operators available for overloading; outside of the Go-sphere, symbols have any number of different uses and interpretations. An example I like to use is a Lua library called LPeg, which uses operator overloading to create something entirely new by reusing existing syntax.

Problems

cosmos72 commented 3 years ago

Nice! really an interesting idea :)

I think there's another, not so obvious, drawback: there is no way to specify where the result should be stored.

For example, math/big.Int defines addition as follows:

// Add sets z to the sum x+y and returns z.
func (z *Int) Add(x, y *Int) *Int

and is used as: result.Add(x, y). This gives control over allocation to the caller which, if he/she wants, can avoid allocating a new big.Int for every arithmetic operation.

In the proposal above, one has to write instead x .+ y which is conceptually equivalent to x.Add(y). If the representation of x and y are not fixed-size, it forces the implementation of Add to allocate memory where to store the result before returning it.

Depending on the intended use, this may be either relatively harmless or a major performance hit.

deanveloper commented 3 years ago

I think there's another, not so obvious, drawback: there is no way to specify where the result should be stored.

This is intentional, in order to enforce that mathematical operators do not mutate their “arguments”. I had initially used math/big as an example in the proposal, but ended up removing it, because of the fact that all of math/big‘s struct’s methods mutate their members.

The main reason behind this is the same reason that types with a pointer as its underlying type are discouraged (and cannot have methods!). It gets very confusing when it doesn’t look like a function call looks like it isn’t going to mutate, and then something gets mutated without the user realizing it.

cosmos72 commented 3 years ago

So, math operators for immutable objects only. Surely easier to implement.

My guess is it would not be a good fit for high-performance vector/matrix operations, or arbitrary precision numbers (math/big.Int, math/big.Rat, math/big.Float etc), or for anything that may require significant memory allocations.

On the other hand, at least in Python and Lisp, arithmetic on arbitrary precision numbers (Python long and Lisp bignum, ratio) is implemented exactly as you propose: arbitrary precision numbers are immutable and operations on them always allocate new memory where to store the result.