JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.75k stars 5.49k forks source link

broadcast should not drop zero-dimensional results to scalars #28866

Open ZeppLu opened 6 years ago

ZeppLu commented 6 years ago
julia> map(typeof, (zeros(), ones(), fill(3.14)))
(Array{Float64,0}, Array{Float64,0}, Array{Float64,0})

julia> typeof(ones() .* fill(3.14))
Float64

julia> typeof(zeros() + ones())  # in fact not only after broadcast
Float64

Is it a bug, or a feature?

mbauman commented 6 years ago

This is an intentional feature. Zero-dimensional arrays act like scalars in broadcast.

ZeppLu commented 6 years ago

OK seems like it's a feature.

But I still wonder why typeof(zeros() + ones()) == Float64, note there's no broadcast. Intuitively speaking, the sum of two arrays should always be an array as well, shouldn't it?

mbauman commented 6 years ago

Oh my, you're right — I missed your third example there. That indeed is a bug.

mbauman commented 6 years ago

In short: we have long depended upon broadcasting to implement a number of array functions that implicitly work elementwise. We just add additional size checks. Unfortunately we now also need to add a zero-dimensional check, too.

Just around the definition of +, I can see that this also impacts -, conj, real, and imag.

ZeppLu commented 6 years ago

IMHO, we should make things consistent here, if ones() .+ ones() returns a scalar while ones() + ones() returns an array, people will get confused.

My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.

mbauman commented 6 years ago

Yes, I'm in complete agreement. That's the bug, and it needs to be fixed.

Edit (a year later): I'm confused by my comment. I'm pretty sure I didn't mean to say what this says. I just wanted to fix the + behavior but keep broadcast the same.

andyferris commented 6 years ago

The casting of 0D arrays to scalars at the end of broadcast has been bothering me too.

yurivish commented 5 years ago

I just noticed this too while working to update the tests for https://github.com/mbauman/InvertedIndices.jl to 1.0.

Specifically,

    A = fill(1)
    @test A[Not(A.==1)] == []

fails with

Test threw exception
  Expression: A[Not(A .== 1)] == []
  ArgumentError: invalid index: true of type Bool

since fill(1) .== 1 returns true on 1.0.

bzinberg commented 5 years ago

https://github.com/JuliaLang/julia/issues/28866#issuecomment-415813898

My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.

IIUC, what you are calling "treating 0d arrays as scalars" is not an exception, it's just an application of the usual broadcasting rules, as () is broadcastable to any shape. If 0d arrays didn't behave this way under broadcast, that would be a special-case exception to the broadcast rules. So you definitely have my +1 on that :+1:

mbauman commented 5 years ago

Here's the crux of the problem: What should 1 .+ 2 return? Is it 3 or is it fill(3)?

As far as broadcast is concerned, it's exactly the same as fill(1) .+ 2 and fill(1) .+ fill(2); 1 and fill(1) have the same axes.

bzinberg commented 5 years ago

IMO, the following chain of equivalences should hold:

1 .+ 2 === broadcast(+, 1, 2) === broadcast(+, convert(1), convert(2)) == fill(3)

I say this because the type signature broadcast should require the arguments to be arrays, so when the user gives the shorthand of an Int, it should be WAI that the int be implicitly converted to an array.

I don't see any principled reason for 1 .+ 2 to be 3.

mbauman commented 5 years ago

I don't see any principled reason for 1 .+ 2 to be 3.

I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical. I initially tried to make fill(1) .+ 2 == fill(3) (https://github.com/JuliaLang/julia/pull/26212) and locally even tried changing the 1 .+ 2 behavior as well but it was far more breaking than I was able to stomach. There is lots of code that intentionally allows scalar- or array-like behaviors with broadcasting. There is also lots of code that unnecessarily applies broadcasting to scalars.

This is something we could reconsider as a breaking change for 2.0, but in order for it to be feasible I think we'd also need to allow 0-dimensional arrays to participate in the linear algebra of vectors and matrices. I'm also still not convinced that it's actually all that terrible in practice, despite my own distaste for it in theory.

For more details, check out https://github.com/JuliaLang/julia/pull/17318.

andyferris commented 5 years ago

What’s the relationship between broadcast and linear algebra?

mbauman commented 5 years ago

It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like fill(2) * rand(3,3) would ameliorate some of the complaints I'm quite certain we'd receive.

bzinberg commented 5 years ago

I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical.

I think for big software projects, it's best to have the guts of the language be meticulous about degenerate cases, and to have the convenient-but-slightly-unprincipled stuff be found only very close to the surface, so as not to muddy the ability to reason consistently about the software. I'd say in this bug we are reaching precisely the point at which the motivation for that thinking becomes apparent: because Julia handled zero-dimensional broadcasting inconsistently since the beginning, it's now much easier to make a band-aid fix to the internals that special-cases zero-dimensional arrays, than to rework the internals so that the special case is not necessary.

(This is totally a hindsight observation, and I don't think anyone necessarily did anything wrong in this regard during the development trajectory of Julia, since there were trade-offs to make at every step of the way.)

bzinberg commented 5 years ago

It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like fill(2) * rand(3,3) would ameliorate some of the complaints I'm quite certain we'd receive.

What is the intended meaning of fill(2) * rand(3, 3)? (I understand .*, but not *)

mbauman commented 5 years ago

Yes, history and inertia is one part of the story. The other part is that it is practically quite useful and (I think) friendlier. Arithmetic on a zero-dimensional array is quite limited, but there would be ways of defining it to be a bit friendlier (like having fill(2) * A be defined to mean the same thing as 2*A).

Now, I'm much more sympathetic to cases like fill(1) .+ fill(2) or fill(1) .== 1 because there you are already starting with an Array{T,0}, and there is some good sense in preserving that. But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

bzinberg commented 5 years ago

(like having fill(2) * A be defined to mean the same thing as 2*A)

My concern there is that the *(a::Array, b::Array) already means matrix product, and so defining fill(2) * A to be fill(2) .* A rather than matmul(fill(2), A) (which only makes sense if size(A)[end-1] == 0) is a redefinition of an existing concept that makes it more complicated due to the added special case.

One thought that comes to mind is that perhaps we could have an infix operator for the tensor product. It seems to me that confusion of what 2 * A means (either "scale the matrix" or "a matrix product that has mismatched dimensions") comes from not realizing that the first one is a tensor product, not a matrix product.

But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

Hm. I would say that any dotted operator should always return an array. That's a consistent and easy-to-remember rule, and I think having that consistency could save headaches like this without burdening programmers much. What it might do is be slightly surprising to people who are used to math notation which often just uses juxtaposition for all the many notions of multiplication (scalar, matrix, function composition, tensor). But I think it's good that we don't carry that notational ambiguity into this language :)

And converting an Array{T, 0} to a T is easy and IMO familiar as well: just add [] to the end. I'd call it analogous to how in Python, if L = [x], I wouldn't expect L and x to behave the same way -- I would know to write L[0], and readers would understand that.

bzinberg commented 5 years ago

Now, I'm much more sympathetic to cases like fill(1) .+ fill(2) or fill(1) .== 1 because there you are already starting with an Array{T,0}, and there is some good sense in preserving that. But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

I would describe the situation as this: There is a type U and a function

f(w::W, x::U, y::U)::U

(in our case f = broadcast). There is an implicit conversion from Float64 to U. We then write z = f(w, 1.0, 2.0). I think we don't have any reason to expect z to have all the same behaviors as Float64 unless there is also an implicit conversion from U to Float64. (Also, Array{Float64,0} does support axes and getindex?)

andyferris commented 5 years ago

Honestly, I never much liked that Number has container semantics (that seemed to be a purely pragmatic decision to ease certain parts of the implementation in Base but results in silly bugs where typos like for i in N proceed without error) and broadcast is an operation on containers. Basically I feel that 1 .+ 2 is a programmer error - I wouldn't let it slide on a PR for example.

In very generic code the one thing you do have to mentally track is which variables are scalar and which are containers, so I don't see any impediment to generic code if for example it were forbidden to write broadcast operations on scalars only (IMO such a choice should basically only catch logic errors or bugs). But returning a 0d array in the default case seems fairly a fairly pragmatic choice, and has logical consistency with the way scalars are treated within a larger broadcast operation.

Also note that if we support fill(2) * matrix we still probably shouldn't support something like matrix + fill(2) in the future (where I still believe matrix + 2 makes sense as matrix + 2I).

bzinberg commented 5 years ago

@andyferris

(where I still believe matrix + 2 makes sense as matrix + 2I)

This is specific to the rank-2 case, right?

mbauman commented 5 years ago

The best way to evaluate this change would be to try it out. Personally, I don't find the status quo so abhorrent, so I won't be pushing for this change, and clearly it'd be a breaking v2.0 change, but it's a really easy change to make. Just steal the logic from #32122 — it really shouldn't be more than 10LOC between all the builtin broadcast implementations. Then the big question is how many LOC worth of tests and packages need to change. I'm up for changing my mind on the practicality here if the evidence weighs against it!

mbauman commented 5 years ago

Here's a compelling argument: https://discourse.julialang.org/t/broadcasting-and-pairs-using/24739

In short:

p = "2".=>"two"
replace.(["123", "246"], p)

vs.

replace.(["123", "246"], "2".=>"two")

The first fails, whereas the second succeeds, but the two should be equivalent in their end result. This only happens because we've lost the 0-d container.

bzinberg commented 5 years ago

@mbauman, I'd be interested in trying your suggestion in https://github.com/JuliaLang/julia/issues/28866#issuecomment-496604867 to see how much work it is. I'm new to contributing to Julia -- can you point me to what I need to know to get my local clone to the point where I can make a change to the broadcast implementation, run some compile/test commands, and see what breaks? I can do that quickly, then over the next few weeks work on it as I have time.

mbauman commented 5 years ago

Here you go: https://github.com/JuliaLang/julia/compare/mb/true28866

Just:

git clone https://github.com/JuliaLang/julia.git
cd julia
git checkout mb/true28866
make
make testall
mbauman commented 5 years ago

Just an administrative note: this issue started out describing both broadcast's design as well as the bug in #32122 — and I wrote the commit message there before we really started hashing out broadcast's design here. So we should re-open this issue if that commit message ends up auto-closing this — it only addresses the "easy" half of this issue.

cossio commented 4 years ago

I find zeros() .+ 1 == 1 very unintuitive. I had an issue with this today. The following function:

f(x::AbstractArray; dims=:) = sum(x .+ 1; dims=dims)

Then f(x) works for all arrays, except for zero-dimensional arrays because of this. In this example I now have to special case dims=:.

Maybe there should be a definition for sum(::Number; dims::Colon).

GiggleLiu commented 2 years ago

Broadcasting should return 0-dimensional array! Support you with another using case:

julia> conj(Zygote.Fill(1.0im))
0-dimensional Array{FillArrays.Fill{ComplexF64, 0, Tuple{}}, 0}:
Fill(0.0 - 1.0im)

The conj over filled array returns a 0-dimensional array with filled array inside!

This is cause by:

  1. first broadcast over Fill, returns a Fill type.
  2. then wrap a 0-dimensional array to keep the return shape the same.

https://github.com/JuliaLang/julia/blob/938da264a2e40ba00653ae6659221f5af021c85b/base/broadcast.jl#L860

Is it really safe to treat all arrays as dense array type? The above code is very bad to ecosystem.

This now causing OMEinsum AD test failure: https://github.com/under-Peter/OMEinsum.jl/pull/141

mbauman commented 2 years ago

Ironically, that's happening because FillArray's Fill defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.

It'd work as you want if FillArrays would match Base's current broadcasting semantics.

GiggleLiu commented 2 years ago

Ironically, that's happening because FillArray's Fill defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.

It'd work as you want if FillArrays would match Base's current broadcasting semantics.

It is not only the problem of the FillArrays. This implementation in Julia base (https://github.com/JuliaLang/julia/blob/938da264a2e40ba00653ae6659221f5af021c85b/base/broadcast.jl#L860) is bad:

length(axes(bc)) == 0 ? fill!(similar(bc, typeof(r)), r) : r

Two aspects can be improved

  1. it should check the type of the original array and try to return the same array type,
  2. it should be type stable.

I won't be happy if conj(::Fill) returns me a normal array. Maybe fallback conj, real function on the broadcast method is a wrong decision, because broadcast is not bijective in type (different array types can go to the same scalar type).

mbauman commented 2 years ago
  1. it should check the type of the original array and try to return the same array type,

It tries to; that's what the similar(bc, typeof(r)) part does. It's up to the broadcast implementation to specialize that.

  1. it should be type stable.

It is not a type instability because the length of the axes is encoded into the broadcast's type.

GiggleLiu commented 2 years ago
  1. it should check the type of the original array and try to return the same array type,

It tries to; that's what the similar(bc, typeof(r)) part does. It's up to the broadcast implementation to specialize that.

  1. it should be type stable.

It is not a type instability because the length of the axes is encoded into the broadcast's type.

I see what you mean. So it is a wield interaction of similar does not preserve type and broadcast does not keep array type information for 0-dimensional array. Now I agree there is no easy solution. Hope Julia v2.0 can make the broadcast more predictable.

kellertuer commented 1 year ago

Hi, I just sumbled upon this as well and just to add to the irritation here, I wrote a small discourse about my encounter https://discourse.julialang.org/t/broadcast-strange-on-0-dimensional-array/97311 - which I will resolve by not using 0-dim arrays, since for me they currently seem to not behave as I expect arrays to behave (especially in the .+ case.

sadish-d commented 1 year ago

I came to realize the broadcasting behavior on 0-dimensional arrays pretty late. This will mean a significant implementation change for what I'm working on. A minimal example of the issue I'm running into:

f(x::Array{<:Real}) = x .+ 1
g(y::Array{<:Real}) = nothing

a = [1]
b = fill(1)

(g∘f)(a) # works
(g∘f)(b) # breaks
MilesCranmer commented 4 months ago

This is especially problematic for GPU code because scalarization strips any allocation information present in a zero-dimensional array:

julia> x = MtlArray(fill(1f0))
0-dimensional MtlArray{Float32, 0, Private}:
1.0

julia> x .+ x
2.0f0

which now lives on the CPU.