Closed dalum closed 3 years ago
So I'm not clear what problem we are solving with +
as it currently stands, though I feel it does cause problems. What I see is that if I have:
f(x,y) = x*y + x + y + 1
with the current behavior I would call f(x,y)
for scalars (though f.(x,y)
happens to work), f.(x,y)
for vectors would do an elementwise version, and for square matrices f(x,y)
doesn't do what a polynomial equation should do (though f.(x,y)
gives the element-wise version).
Basically I feel that allowing array + scalar
do automatic broadcasting can be a bit dangerous because it's misleading in the square matrix case (such as the OP where E
was scalar). It also happens to be the last auto-vectorized function that I'm aware of, which makes it a rather special case. Please note that the only reason for defining +(x::AbstractArray, y::AbstractArray)
is because it makes sense for linear algebra (x
and y
live in a vector space) otherwise we would have to require x .+ y
for element-wise addition. Similarly, *
for arrays and array-scalar combinations is motivated by linear algebra. So I'm really not sure why matrix + scalar
would do anything inconsistent with linear algebra... and thus in favor of this change (or else making scalar + array
an error with a nice message to use .+
).
Half of this issue is already fixed with #22932.
With https://github.com/JuliaLang/julia/pull/23923, it would again be possible to use I
for the second part of the proposal instead of using 1
. I feel that it might be a reasonable compromise to require I
here since it will be generic (work for Numbers as well) while some users might get tripped by A + 1
being the same as A + I
instead of throwing.
I feel we should aim to support the distributive law matrix * vector + scalar * vector == (matrix + scalar) * vector
in v1.0 because AFAICT a lot of the confusion around the previous behaviour relates to this.
While I agree with that in principle, I fear that the behavior of other systems (Python, R, Matlab) here will make this quite dangerous. People will write A + 1
expecting it to do what A .+ 1
does and it will silently do something very different – i.e. add one to the diagonal of A
. The safe option here is to make matrix + scalar
an error. The distributive law should hold for uniform scaling objects, however: matrix * vector + scalar * I * vector == (matrix + scalar * I) * vector
.
I'm sorry if I'm being too greedy, but I'm not really sure that it is solved by the reintroduction of I + scalar
. Generic polynomials of multiple variables now require all scalars to be multiplied by I
before being input. I. e., f(x, y) = x + y
will not work for f(matrix, scalar)
. It could be made to work if written as f(x, y) = x*I + y*I
, but then f(scalar, scalar)
returns a UniformScaling
.
This gets even trickier when you have a function such as f(x, y) = exp(x) + y
. Right now, exp(::UniformScaling)
isn't defined. This is arguably a bug, but then every valid matrix function defined for scalars must also be defined for UniformScaling
.
Note also that distributivity does not hold for UniformScaling
: (matrix + scalar) * I != matrix * I + scalar * I
.
Finally, if matrix + scalar
is supported, then perhaps UniformScaling
could be deprecated. 😼
Edit: Rereading the issue I now see with @andreasnoack that UniformScalings have the right properties except x I + y I = x*y^0 + y*x^0
.
@eveydee, The direction you're moving this is very much towards scalars simply being what UniformScaling was introduced as. We should stop and consider if that's where we want to go. I'm not saying it's not, but we should certainly consider it. And if we do that, it should be one of the first warnings we give to people coming from other systems: in Julia, matrix + scalar
adds scalar to the diagonal of matrix
and only works for square matrices.
@StefanKarpinski You are right, and I am not sure it is the right way to go. The problem for me is: how to write code that works for generic linear algebra objects. That is, how do you translate from equations that you derive on a whiteboard to code implemented in Julia. As I see it, there are two possible routes to being able to write fully generic code, which has simple-to-follow rules, none of which are currently satisfied:
UniformScaling
is the scaling operator of the linear algebra world. In this case, the fact that scalar * matrix
and scalar * vector
works is just for convenience in special cases, and the rule of thumb is that for linear algebra applications, all scalars should be multiplied by I
.UniformScaling
serves no purpose and can be deprecated.The first one is annoying for people like me, who primarily use Julia to do linear algebra (arguably a minority in the Julia community), but the inconvenience of suffixing every scalar literal constant by I
is probably something that goes away, when it becomes a habit, and is infinitely preferable to manually figuring out the sizes of the identity matrices as is the case in most other languages. The bigger problem is ensuring that it is consistent across objects. For instance, all functions defined for AbstractMatrix
that you might use in an equation must also be defined for UniformScaling
(exp
, log
, sqrt
, sin
, etc.). Also, functions which are rooted in linear algebra, like norm
, trace
, etc. should probably be changed to return a UniformScaling
object. In my opinion, the complexity here gets out of hand quite quickly.
The second one has the potential to be a source of confusion and bugs for people with Matlab or Python muscle-memory who use arrays for non-linear algebra applications. However, all it requires is defining scalar + matrix
so the complexity is low, and UniformScaling
becomes obsolete.
What about having @I
or something thing similar that could be used like @.
to add lots of multiplication by Identity matrices to make stuff work, while preventing people from hitting it by accident?
However, all it requires is defining
scalar + matrix
so the complexity is low, andUniformScaling
becomes obsolete.
This is not entirely true since one can also do neat things like this with I
currently:
julia> A = rand(3, 2)
3×2 Array{Float64,2}:
0.777236 0.80784
0.46201 0.548122
0.447051 0.407108
julia> [A I; I A]
6×5 Array{Float64,2}:
0.777236 0.80784 1.0 0.0 0.0
0.46201 0.548122 0.0 1.0 0.0
0.447051 0.407108 0.0 0.0 1.0
1.0 0.0 0.0 0.777236 0.80784
0.0 1.0 0.0 0.46201 0.548122
0.0 0.0 1.0 0.447051 0.407108
julia> [A I; I A']
5×5 Array{Float64,2}:
0.777236 0.80784 1.0 0.0 0.0
0.46201 0.548122 0.0 1.0 0.0
0.447051 0.407108 0.0 0.0 1.0
1.0 0.0 0.777236 0.46201 0.447051
0.0 1.0 0.80784 0.548122 0.407108
julia> [A 0I; 0I A]
6×5 Array{Float64,2}:
0.777236 0.80784 0.0 0.0 0.0
0.46201 0.548122 0.0 0.0 0.0
0.447051 0.407108 0.0 0.0 0.0
0.0 0.0 0.0 0.777236 0.80784
0.0 0.0 0.0 0.46201 0.548122
0.0 0.0 0.0 0.447051 0.407108
If we deprecated UniformScaling
then either this shorthand would go away or we would write it with scalars, like [A 1; 1 A]
and [A 0; 0 A]
. That's pretty handy but may be a step too far, especially since [A 1; 1 A]
looks like it should, if anything, fill with 1s not identity blocks.
I'm having trouble distilling what exactly this issue is about.
Is it to define +(x::AbstractArray, y::Number) = x .+ y .* one(x)
? It seems that this will not happen. So in that case, can this be closed?
My take is that it is to allow things like Taylor expansions, e.g. exp_taylor3(x) = 1 + x + x*x/2 + x*x*x/6
, work on square matrices as well as numbers. One could currently use exp_taylor3(x) = one(x) + x + x*x/2 + x*x*x/6
.
We've already deprecated array + scalar
for v0.7.
There's also the issue of the distributive law which should read (matrix + scalar)*vector = matrix*vector + scalar*vector
- which implies matrix + scalar == matrix + scalar*I
. Personally, I strongly feel that for our built-in math types we should strive support what are standard and correct associative and distributive laws, in order to allow for expressive and generic code.
It seems that the breaking part of this change has already happened. Whether to allow square matrix + scalar
to mean what square matrix + scalar*I
currently means in the future is not a breaking change given that we've already disallowed that operation.
Hmm? I plan to make a v1.0 PR as soon as a v1.0 branch opens and the v0.7 deprecations are removed. (Probably not a change that would block releasing v1.0, so perhaps this makes sense).
Honestly, I would let things settle for a while before introducing square matrix + scalar
doing the same thing as square matrix + scalar*I
. Sure, we might end up with largely vestigial I
but there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7/1.0.
there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7
I suppose the same argument could be said to apply to v1.1... But at what point to you draw the line?
As @andreasnoack mentioned, with https://github.com/JuliaLang/julia/pull/23923 merged, this seems to work pretty well now. We might add https://github.com/JuliaLang/julia/pull/29951, but there seemed to be a consensus against taking that step.
I understand this has been discussed and experimented with before and people found that it was really annoying when
+
didn't automatically broadcast. But @StefanKarpinski suggested I opened an issue for this to describe the rationale for bringing it up again.Currently, you can write a generic polynomial function that "almost just works" by using
I
in the following pattern:The almost here refers to the fact that this will return a UniformScaling if all of the inputs are scalar. With the deprecation of
+(::UniformScalar, ::Number)
this pattern is broken. After the deprecation, you must now explicitly useone
in defining the polynomial to ensure thatx
andy
have similar dimensionality:But this becomes really messy as soon as you have more than two variables in an expression:
which explodes as you go to higher order. The solution for this is to drop the broadcasting behaviour and define:
or possibly even:
I think the argument that automatic broadcasting happens in other languages is a really good argument for keeping it as it is. But I feel that with dot-notation, we can have the best of both worlds now, since you can define a polynomial like:
where you get the standard or broadcast behaviour by calling either
f(A, B)
orf.(A, B)
.And if you are interested in a personal anecdote, then I spent a whole day (possibly two) debugging my code two months ago in frustration because I was inverting an array of the form:
(H - E)^-1
and couldn't figure out why I got unphysical results. Spoiler:E
was a scalar and I was expecting it to subtract from the diagonal.