JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.75k stars 5.49k forks source link

Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

Closed dalum closed 3 years ago

dalum commented 7 years ago

I understand this has been discussed and experimented with before and people found that it was really annoying when + didn't automatically broadcast. But @StefanKarpinski suggested I opened an issue for this to describe the rationale for bringing it up again.

Currently, you can write a generic polynomial function that "almost just works" by using I in the following pattern:

f(x, y) = x*y*I + x*I + y*I

The almost here refers to the fact that this will return a UniformScaling if all of the inputs are scalar. With the deprecation of +(::UniformScalar, ::Number) this pattern is broken. After the deprecation, you must now explicitly use one in defining the polynomial to ensure that x and y have similar dimensionality:

f(x, y) = x * one(y) + y * one(x)

But this becomes really messy as soon as you have more than two variables in an expression:

f(x, y, z) = x * one(y) * one(z) + y * one(x) * one(z) + z * one(x) * one(y)

which explodes as you go to higher order. The solution for this is to drop the broadcasting behaviour and define:

+(x::AbstractArray, y::Number) = x .+ y .* one(x)
+(x::Number, y::AbstractArray) = x .* one(y) .+ y

or possibly even:

+(x::AbstractArray{<:T}, y::T) where T = x .+ y .* one(x)
+(x::T, y::AbstractArray{<:T}) where T = x .* one(y) .+ y

I think the argument that automatic broadcasting happens in other languages is a really good argument for keeping it as it is. But I feel that with dot-notation, we can have the best of both worlds now, since you can define a polynomial like:

f(x, y) = x * y + x + y + 1

where you get the standard or broadcast behaviour by calling either f(A, B) or f.(A, B).

And if you are interested in a personal anecdote, then I spent a whole day (possibly two) debugging my code two months ago in frustration because I was inverting an array of the form: (H - E)^-1 and couldn't figure out why I got unphysical results. Spoiler: E was a scalar and I was expecting it to subtract from the diagonal.

andyferris commented 7 years ago

So I'm not clear what problem we are solving with + as it currently stands, though I feel it does cause problems. What I see is that if I have:

f(x,y) = x*y + x + y + 1

with the current behavior I would call f(x,y) for scalars (though f.(x,y) happens to work), f.(x,y) for vectors would do an elementwise version, and for square matrices f(x,y) doesn't do what a polynomial equation should do (though f.(x,y) gives the element-wise version).

Basically I feel that allowing array + scalar do automatic broadcasting can be a bit dangerous because it's misleading in the square matrix case (such as the OP where E was scalar). It also happens to be the last auto-vectorized function that I'm aware of, which makes it a rather special case. Please note that the only reason for defining +(x::AbstractArray, y::AbstractArray) is because it makes sense for linear algebra (x and y live in a vector space) otherwise we would have to require x .+ y for element-wise addition. Similarly, * for arrays and array-scalar combinations is motivated by linear algebra. So I'm really not sure why matrix + scalar would do anything inconsistent with linear algebra... and thus in favor of this change (or else making scalar + array an error with a nice message to use .+).

andreasnoack commented 7 years ago

Half of this issue is already fixed with #22932.

With https://github.com/JuliaLang/julia/pull/23923, it would again be possible to use I for the second part of the proposal instead of using 1. I feel that it might be a reasonable compromise to require I here since it will be generic (work for Numbers as well) while some users might get tripped by A + 1 being the same as A + I instead of throwing.

andyferris commented 7 years ago

I feel we should aim to support the distributive law matrix * vector + scalar * vector == (matrix + scalar) * vector in v1.0 because AFAICT a lot of the confusion around the previous behaviour relates to this.

StefanKarpinski commented 7 years ago

While I agree with that in principle, I fear that the behavior of other systems (Python, R, Matlab) here will make this quite dangerous. People will write A + 1 expecting it to do what A .+ 1 does and it will silently do something very different – i.e. add one to the diagonal of A. The safe option here is to make matrix + scalar an error. The distributive law should hold for uniform scaling objects, however: matrix * vector + scalar * I * vector == (matrix + scalar * I) * vector.

dalum commented 7 years ago

I'm sorry if I'm being too greedy, but I'm not really sure that it is solved by the reintroduction of I + scalar. Generic polynomials of multiple variables now require all scalars to be multiplied by I before being input. I. e., f(x, y) = x + y will not work for f(matrix, scalar). It could be made to work if written as f(x, y) = x*I + y*I, but then f(scalar, scalar) returns a UniformScaling.

This gets even trickier when you have a function such as f(x, y) = exp(x) + y. Right now, exp(::UniformScaling) isn't defined. This is arguably a bug, but then every valid matrix function defined for scalars must also be defined for UniformScaling.

Note also that distributivity does not hold for UniformScaling: (matrix + scalar) * I != matrix * I + scalar * I.

Finally, if matrix + scalar is supported, then perhaps UniformScaling could be deprecated. 😼

mschauer commented 7 years ago

Edit: Rereading the issue I now see with @andreasnoack that UniformScalings have the right properties except x I + y I = x*y^0 + y*x^0.

StefanKarpinski commented 7 years ago

@eveydee, The direction you're moving this is very much towards scalars simply being what UniformScaling was introduced as. We should stop and consider if that's where we want to go. I'm not saying it's not, but we should certainly consider it. And if we do that, it should be one of the first warnings we give to people coming from other systems: in Julia, matrix + scalar adds scalar to the diagonal of matrix and only works for square matrices.

dalum commented 7 years ago

@StefanKarpinski You are right, and I am not sure it is the right way to go. The problem for me is: how to write code that works for generic linear algebra objects. That is, how do you translate from equations that you derive on a whiteboard to code implemented in Julia. As I see it, there are two possible routes to being able to write fully generic code, which has simple-to-follow rules, none of which are currently satisfied:

The first one is annoying for people like me, who primarily use Julia to do linear algebra (arguably a minority in the Julia community), but the inconvenience of suffixing every scalar literal constant by I is probably something that goes away, when it becomes a habit, and is infinitely preferable to manually figuring out the sizes of the identity matrices as is the case in most other languages. The bigger problem is ensuring that it is consistent across objects. For instance, all functions defined for AbstractMatrix that you might use in an equation must also be defined for UniformScaling (exp, log, sqrt, sin, etc.). Also, functions which are rooted in linear algebra, like norm, trace, etc. should probably be changed to return a UniformScaling object. In my opinion, the complexity here gets out of hand quite quickly.

The second one has the potential to be a source of confusion and bugs for people with Matlab or Python muscle-memory who use arrays for non-linear algebra applications. However, all it requires is defining scalar + matrix so the complexity is low, and UniformScaling becomes obsolete.

oscardssmith commented 7 years ago

What about having @Ior something thing similar that could be used like @. to add lots of multiplication by Identity matrices to make stuff work, while preventing people from hitting it by accident?

StefanKarpinski commented 7 years ago

However, all it requires is defining scalar + matrix so the complexity is low, and UniformScaling becomes obsolete.

This is not entirely true since one can also do neat things like this with I currently:

julia> A = rand(3, 2)
3×2 Array{Float64,2}:
 0.777236  0.80784
 0.46201   0.548122
 0.447051  0.407108

julia> [A I; I A]
6×5 Array{Float64,2}:
 0.777236  0.80784   1.0  0.0       0.0
 0.46201   0.548122  0.0  1.0       0.0
 0.447051  0.407108  0.0  0.0       1.0
 1.0       0.0       0.0  0.777236  0.80784
 0.0       1.0       0.0  0.46201   0.548122
 0.0       0.0       1.0  0.447051  0.407108

julia> [A I; I A']
5×5 Array{Float64,2}:
 0.777236  0.80784   1.0       0.0       0.0
 0.46201   0.548122  0.0       1.0       0.0
 0.447051  0.407108  0.0       0.0       1.0
 1.0       0.0       0.777236  0.46201   0.447051
 0.0       1.0       0.80784   0.548122  0.407108

julia> [A 0I; 0I A]
6×5 Array{Float64,2}:
 0.777236  0.80784   0.0  0.0       0.0
 0.46201   0.548122  0.0  0.0       0.0
 0.447051  0.407108  0.0  0.0       0.0
 0.0       0.0       0.0  0.777236  0.80784
 0.0       0.0       0.0  0.46201   0.548122
 0.0       0.0       0.0  0.447051  0.407108

If we deprecated UniformScaling then either this shorthand would go away or we would write it with scalars, like [A 1; 1 A] and [A 0; 0 A]. That's pretty handy but may be a step too far, especially since [A 1; 1 A] looks like it should, if anything, fill with 1s not identity blocks.

KristofferC commented 7 years ago

I'm having trouble distilling what exactly this issue is about.

Is it to define +(x::AbstractArray, y::Number) = x .+ y .* one(x)? It seems that this will not happen. So in that case, can this be closed?

andyferris commented 7 years ago

My take is that it is to allow things like Taylor expansions, e.g. exp_taylor3(x) = 1 + x + x*x/2 + x*x*x/6, work on square matrices as well as numbers. One could currently use exp_taylor3(x) = one(x) + x + x*x/2 + x*x*x/6.

We've already deprecated array + scalar for v0.7.

There's also the issue of the distributive law which should read (matrix + scalar)*vector = matrix*vector + scalar*vector - which implies matrix + scalar == matrix + scalar*I. Personally, I strongly feel that for our built-in math types we should strive support what are standard and correct associative and distributive laws, in order to allow for expressive and generic code.

StefanKarpinski commented 6 years ago

It seems that the breaking part of this change has already happened. Whether to allow square matrix + scalar to mean what square matrix + scalar*I currently means in the future is not a breaking change given that we've already disallowed that operation.

andyferris commented 6 years ago

Hmm? I plan to make a v1.0 PR as soon as a v1.0 branch opens and the v0.7 deprecations are removed. (Probably not a change that would block releasing v1.0, so perhaps this makes sense).

StefanKarpinski commented 6 years ago

Honestly, I would let things settle for a while before introducing square matrix + scalar doing the same thing as square matrix + scalar*I. Sure, we might end up with largely vestigial I but there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7/1.0.

andyferris commented 6 years ago

there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7

I suppose the same argument could be said to apply to v1.1... But at what point to you draw the line?

vtjnash commented 3 years ago

As @andreasnoack mentioned, with https://github.com/JuliaLang/julia/pull/23923 merged, this seems to work pretty well now. We might add https://github.com/JuliaLang/julia/pull/29951, but there seemed to be a consensus against taking that step.