JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.54k stars 5.47k forks source link

Array+Scalar elementwise operations #6417

Closed goszlanyi closed 10 years ago

goszlanyi commented 10 years ago

Recently (when? prerelease+2486?),
Array+Scalar and Array-Scalar operations were deprecated, and one has to use the .+ and .- operators instead.

I hate this change and see it only as visual clutter. It is also inconsistent, as one can still happily use the Array*Scalar and Array/Scalar operations.

Why is one forbidden while the other one is allowed?

julia> [1,2,3]+10
WARNING: A::Array + x::Number is deprecated, use A .+ x instead.
 in depwarn at deprecated.jl:36
3-element Array{Int32,1}:
 11
 12
 13
julia> [1,2,3]/10
3-element Array{Float64,1}:
 0.1
 0.2
 0.3
lindahua commented 10 years ago

There was a very long debate at #5807, where we got to this consensus.

The decision was based on deep consideration about the consistency in both element-wise operations and algebraic computation.

goszlanyi commented 10 years ago

I am shocked by that consensus, especially that in the discussion I have seen the natural argument:

*I guess "+, -, , / are elementwise when one argument is a scalar" is a reasonable and easy-to-remember rule.**

goszlanyi commented 10 years ago

Some users downloading only binaries will see the result of this decision only today.

blakejohnson commented 10 years ago

I would agree that this is being unnecessarily pedantic about element-wise operators.

nalimilan commented 10 years ago

More precisely, the arguments for the change were summarized here: https://github.com/JuliaLang/julia/issues/5807#issuecomment-35430698

Maybe this is too strict after all. I don't think any concrete cases where the old behavior was a problem were found.

johnmyleswhite commented 10 years ago

Let me put in a vote for loving this rule.

jiahao commented 10 years ago

This goes back to the eternal question about whether Array{T,2} is a two-dimensional container of Ts or whether it is a Matrix imbued with linear algebraic operations. As the latter, only multiplication by a scalar makes sense (and division by a scalar as multiplying by its reciprocal): addition and subtraction do not. As the former, all four elementary operations make sense if one uses broadcasting semantics to perform the appropriate scalar operation on each array element.

My understanding of the current consensus is that we decided to make clear the distinction between broadcasting operators (.+, etc.) which are short forms for scalar operations on each element, and the ordinary operators which work on the scale of the Array itself as a single algebraic entity, and thus should support the semantics of linear algebra only.

goszlanyi commented 10 years ago

@jiahao, Thank you for summarizing the arguments.

But can you give an example when the former, widespread, and no-surprise use of Array op Scalar operations caused any problem for Julia?

blakejohnson commented 10 years ago

matrix+scalar certainly doesn't have an obvious linear algebra meaning. So, the question is really are there behaviors we can adopt that increase usability? Our broadcasting behavior with matrix .+ vector also isn't a valid linear algebra operation. But the automatic broadcasting is hugely convenient and wonderful.

kmsquire commented 10 years ago

From @toivoh's comment in https://github.com/JuliaLang/julia/issues/5807#issuecomment-35430698:

The argument that finally swayed Stefan was that then * (matrix multiplication) distributes over + (non-broadcasting addition), while .* distributes over .+. This should definitely lead to fewer surprises.

This seems to be the key argument for this behavior. To be perfectly clear,

(A+B)*x == A*x + B*x

when A and B are matrices and x is a vector, whereas

(A+1)*x != A*x + 1*x

if we the 1 is broadcast and added to each element of A.

(And similarly,

(A.+1).*B == A.*B .+ 1.*B

if we explicitly specify the broadcast operator.)

Of course, for the last, you don't need to use .+ to add the matrices, because they already have the same shape, so

A.*B .+ 1.*B == A.*B + 1.*B
jiahao commented 10 years ago

The most troublesome example is a / B for scalar a and matrix B. As a broadcasting operation, a ./ B makes sense, but a / B is not well defined. Is it the broadcast operation or a * inv(B)? What if B is singular? And so on...

jiahao commented 10 years ago

Similarly it is not clear if A + b is meant to be A .+ b, which would be functionally like A + b E (E being a matrix of all ones), or A + bI (I being the identity matrix).

blakejohnson commented 10 years ago

@jihao I completely agree that division quickly leads to trouble. I disagree entirely that A+b should ever imply A+bI.

blakejohnson commented 10 years ago

Also, parenthesis in programming languages (as opposed to mathematic expression) are often not movable without consequences. Our broadcasting behavior similarly breaks with some like: (A.*v)*u != A.*(v*u) where u and v are vectors and A is a matrix.

goszlanyi commented 10 years ago

@jiahao The only problematic case is Scalar/Array, I agree. (This is also handled by MATLAB separately.) All other cases are natural as elementwise operations.

kmsquire commented 10 years ago

I disagree entirely that A+b should ever imply A+bI

I agree that it shouldn't. But if it did, then

(A+2)*x == Ax + 2I*x == Ax + 2x

which is quite nice, mathematically. Hence, the need to be clear about what A+2 actually means.

Since Julia is targeting a mathematical audience, I don't really have a problem with it being more mathematically consistent than other languages, even if that means the operators are slightly more pedantic.

I have vociferously argued the opposite view on other issues--that consistency with other languages or convenience was more important than mathematical consistency--when I first started using the language, but I've come to appreciate the subtle distinctions over time. But there is a need for a balance.

kmsquire commented 10 years ago

(A.*v)*u != A.*(v*u)

I don't think there is an expectation that this should be equivalent, since there's no mathematical relation between .* and *. But it is useful to expect that

(a+b)*c == ac + bc

for arbitrary a, b, and c.

andreasnoack commented 10 years ago

All other cases are natural as elementwise operations.

I think this is a question about habit. I you are used to write Matrix+Number then it feels natural to you, but from a math point of view the operation feels unnatural.

But there is a need to have a balance.

Here I think it should be noted the burden is just a matter of typing a . to get the operation you want. I don't think that is a big price to pay for consistency.

blakejohnson commented 10 years ago

I suppose I am fine with that so long as after matrix+scalar falls off the deprecation list, it becomes an error suggesting the .+ syntax. Otherwise, we will have a lot of confused new users.

ivarne commented 10 years ago

@blakejohnson I totally agree: see https://github.com/JuliaLang/julia/commit/d2d7c3217f52c72ac3eb06ccc7da0dfc9fd573c3

In my opinion the fundamental problem is that linear algebra called dibs on the operators without . (to be consistent with matlab), and lots of people don't think in linear algebra terms. When that choice has been made, I think we should try our best to keep users aware about the difference between the dotted operators and the undotted operators.

timholy commented 10 years ago

Overall I support the change. But it was a bit painful, in no small measure because our backtraces are often not as useful as one could hope. I suspect I am not alone in having spent a couple of hours over the last week chasing down missing .s in various test scripts.

But going forward it's no big deal.

goszlanyi commented 10 years ago

I am still chasing the missing .s, and I still think it is not worth it.

JeffBezanson commented 10 years ago

Although I find Gabor's quoting of my own argument highly incisive and compelling, I'm fine with the current behavior. Functions should mean something; they shouldn't just become elementwise because it is convenient or (in many systems) more efficient.

goszlanyi commented 10 years ago

Jeff, I really did not mean to hurt anybody. All I wanted to show that this change will come as a shock for average users, but of course those who come later will more easily adapt to this change. With all my depression, I am closing.

StefanKarpinski commented 10 years ago

I don't think @JeffBezanson was offended at all, @GaborOszlanyi. I think we're all happy to have this kind of measured discussion about issues like this – keeping in mind that one may not always win.

goszlanyi commented 10 years ago

Thanks Stefan. Now that I have found all occurrences to be changed, I am less upset. Also see #6438.

StefanKarpinski commented 10 years ago

I actually think the fact that this is a somewhat painful change is indicative that this is important. The two senses of plus are different and code that doesn't distinguish them is, in some sense, fundamentally ambiguous.

timholy commented 10 years ago

While I support the .+ issue, I'm not sure I agree with that logic. Suppose we decided to change the parser so that indexing of arrays was done with A<7>. That would be a painful change, but it would also be senseless.

johnmyleswhite commented 10 years ago

Couldn't you do A[...] -> A<...> using a simple search-and-replace?

timholy commented 10 years ago

No, because comprehensions would bite you. This issue is the same (and indeed slightly worse), because your editor can't reliably tell when a quantity is a scalar.

ivarne commented 10 years ago

@timholy With that change you could write a simple parser that automatically fixed your codebase. With + and .+, you would have to actually parse and infer the types to do a somewhat automatic conversion. That makes it more painful, but also a more useful change, because you get better better less ambiguous code.

StefanKarpinski commented 10 years ago

Changing a[k] to a<k> everywhere is only as hard as parsing. Indexing expressions and comprehensions can be distinguished purely syntactically – the parser does exactly that. The analogous change with indexing would be if we realized that there are really two very different senses of indexing into something and we really should be distinguishing them. Then you would have to pore through all uses of a[k] and decide which one was really going on – something which is inherently difficult, not just a matter of syntax – and if there were two different senses of indexing, I would argue, just as valuable.

timholy commented 10 years ago

if there were two different senses of indexing

To get away from artificial examples like A<...>, I would say there are: #5949. Perhaps this issue is an argument not to rely too heavily on multiple dispatch, and that there too we should use two different function names to distinguish slicing vs dimension-preserving indexing.

StefanKarpinski commented 10 years ago

I actually just did this by accident and the new behavior annoyed / saved me – at first I was annoyed, then I realized that what I'd written was a mistake and the warning had saved prevented me from messing up. What I did was write a polynomial expression and then apply it to matrices:

julia> A = randn(3,3)
3x3 Array{Float64,2}:
  1.04366    0.724268   0.187964
  0.347349   1.24317    0.84459
 -1.14403   -0.295596  -1.46345

julia> A^2 - 2A + 1
WARNING: A::Array + x::Number is deprecated, use A .+ x instead.
 in + at deprecated.jl:26
3x3 Array{Float64,2}:
 0.0384445  1.15218     1.15688
 0.133397   0.0610497  -0.809932
 3.66563    0.827721    5.60388

Oops. Did I want to add a 3x3 matrix of ones at the end or did I want to add a 3x3 identity matrix? Good question. And that's why it's a good thing that this behavior will be undefined and force you to choose between these very different meanings:

julia> A^2 - 2A .+ 1
3x3 Array{Float64,2}:
 0.0384445  1.15218     1.15688
 0.133397   0.0610497  -0.809932
 3.66563    0.827721    5.60388

julia> A^2 - 2A + I
3x3 Array{Float64,2}:
  0.0384445   0.152184    0.156877
 -0.866603    0.0610497  -1.80993
  2.66563    -0.172279    5.60388

One thing I think we should do is allow UniformScaling operators to work with scalars so that an expression like x^2 - 2x + I can be applied to both scalar and matrix x generically.

StefanKarpinski commented 10 years ago

See https://github.com/JuliaLang/julia/pull/6472, allowing polynomials using I for constant terms to apply to scalar and matrix arguments.