Open ZeppLu opened 6 years ago
This is an intentional feature. Zero-dimensional arrays act like scalars in broadcast.
OK seems like it's a feature.
But I still wonder why typeof(zeros() + ones()) == Float64
, note there's no broadcast. Intuitively speaking, the sum of two arrays should always be an array as well, shouldn't it?
Oh my, you're right — I missed your third example there. That indeed is a bug.
In short: we have long depended upon broadcasting to implement a number of array functions that implicitly work elementwise. We just add additional size checks. Unfortunately we now also need to add a zero-dimensional check, too.
Just around the definition of +
, I can see that this also impacts -
, conj
, real
, and imag
.
IMHO, we should make things consistent here, if ones() .+ ones()
returns a scalar while ones() + ones()
returns an array, people will get confused.
My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.
Yes, I'm in complete agreement. That's the bug, and it needs to be fixed.
Edit (a year later): I'm confused by my comment. I'm pretty sure I didn't mean to say what this says. I just wanted to fix the +
behavior but keep broadcast the same.
The casting of 0D arrays to scalars at the end of broadcast has been bothering me too.
I just noticed this too while working to update the tests for https://github.com/mbauman/InvertedIndices.jl to 1.0.
Specifically,
A = fill(1)
@test A[Not(A.==1)] == []
fails with
Test threw exception
Expression: A[Not(A .== 1)] == []
ArgumentError: invalid index: true of type Bool
since fill(1) .== 1
returns true
on 1.0.
https://github.com/JuliaLang/julia/issues/28866#issuecomment-415813898
My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.
IIUC, what you are calling "treating 0d arrays as scalars" is not an exception, it's just an application of the usual broadcasting rules, as ()
is broadcastable to any shape. If 0d arrays didn't behave this way under broadcast, that would be a special-case exception to the broadcast rules. So you definitely have my +1 on that :+1:
Here's the crux of the problem: What should 1 .+ 2
return? Is it 3
or is it fill(3)
?
As far as broadcast is concerned, it's exactly the same as fill(1) .+ 2
and fill(1) .+ fill(2)
; 1
and fill(1)
have the same axes.
IMO, the following chain of equivalences should hold:
1 .+ 2 === broadcast(+, 1, 2) === broadcast(+, convert(1), convert(2)) == fill(3)
I say this because the type signature broadcast
should require the arguments to be arrays, so when the user gives the shorthand of an Int
, it should be WAI that the int be implicitly converted to an array.
I don't see any principled reason for 1 .+ 2
to be 3
.
I don't see any principled reason for
1 .+ 2
to be3
.
I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical. I initially tried to make fill(1) .+ 2 == fill(3)
(https://github.com/JuliaLang/julia/pull/26212) and locally even tried changing the 1 .+ 2
behavior as well but it was far more breaking than I was able to stomach. There is lots of code that intentionally allows scalar- or array-like behaviors with broadcasting. There is also lots of code that unnecessarily applies broadcasting to scalars.
This is something we could reconsider as a breaking change for 2.0, but in order for it to be feasible I think we'd also need to allow 0-dimensional arrays to participate in the linear algebra of vectors and matrices. I'm also still not convinced that it's actually all that terrible in practice, despite my own distaste for it in theory.
For more details, check out https://github.com/JuliaLang/julia/pull/17318.
What’s the relationship between broadcast
and linear algebra?
It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like fill(2) * rand(3,3)
would ameliorate some of the complaints I'm quite certain we'd receive.
I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical.
I think for big software projects, it's best to have the guts of the language be meticulous about degenerate cases, and to have the convenient-but-slightly-unprincipled stuff be found only very close to the surface, so as not to muddy the ability to reason consistently about the software. I'd say in this bug we are reaching precisely the point at which the motivation for that thinking becomes apparent: because Julia handled zero-dimensional broadcasting inconsistently since the beginning, it's now much easier to make a band-aid fix to the internals that special-cases zero-dimensional arrays, than to rework the internals so that the special case is not necessary.
(This is totally a hindsight observation, and I don't think anyone necessarily did anything wrong in this regard during the development trajectory of Julia, since there were trade-offs to make at every step of the way.)
It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like
fill(2) * rand(3,3)
would ameliorate some of the complaints I'm quite certain we'd receive.
What is the intended meaning of fill(2) * rand(3, 3)
? (I understand .*
, but not *
)
Yes, history and inertia is one part of the story. The other part is that it is practically quite useful and (I think) friendlier. Arithmetic on a zero-dimensional array is quite limited, but there would be ways of defining it to be a bit friendlier (like having fill(2) * A
be defined to mean the same thing as 2*A
).
Now, I'm much more sympathetic to cases like fill(1) .+ fill(2)
or fill(1) .== 1
because there you are already starting with an Array{T,0}
, and there is some good sense in preserving that. But in the case of 1 .+ 2
I'm more strongly opposed to wrapping its result in an Array{T,0}
. You are starting with something that has axes
and supports getindex
and you end up with the same sort of thing that has axes
and supports getindex
. The result is also far more useful.
(like having
fill(2) * A
be defined to mean the same thing as2*A
)
My concern there is that the *(a::Array, b::Array)
already means matrix product, and so defining fill(2) * A
to be fill(2) .* A
rather than matmul(fill(2), A)
(which only makes sense if size(A)[end-1] == 0
) is a redefinition of an existing concept that makes it more complicated due to the added special case.
One thought that comes to mind is that perhaps we could have an infix operator for the tensor product. It seems to me that confusion of what 2 * A
means (either "scale the matrix" or "a matrix product that has mismatched dimensions") comes from not realizing that the first one is a tensor product, not a matrix product.
But in the case of
1 .+ 2
I'm more strongly opposed to wrapping its result in anArray{T,0}
. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.
Hm. I would say that any dotted operator should always return an array. That's a consistent and easy-to-remember rule, and I think having that consistency could save headaches like this without burdening programmers much. What it might do is be slightly surprising to people who are used to math notation which often just uses juxtaposition for all the many notions of multiplication (scalar, matrix, function composition, tensor). But I think it's good that we don't carry that notational ambiguity into this language :)
And converting an Array{T, 0}
to a T
is easy and IMO familiar as well: just add []
to the end. I'd call it analogous to how in Python, if L = [x]
, I wouldn't expect L
and x
to behave the same way -- I would know to write L[0]
, and readers would understand that.
Now, I'm much more sympathetic to cases like
fill(1) .+ fill(2)
orfill(1) .== 1
because there you are already starting with anArray{T,0}
, and there is some good sense in preserving that. But in the case of1 .+ 2
I'm more strongly opposed to wrapping its result in anArray{T,0}
. You are starting with something that hasaxes
and supportsgetindex
and you end up with the same sort of thing that hasaxes
and supportsgetindex
. The result is also far more useful.
I would describe the situation as this: There is a type U
and a function
f(w::W, x::U, y::U)::U
(in our case f = broadcast
). There is an implicit conversion from Float64
to U
. We then write z = f(w, 1.0, 2.0)
. I think we don't have any reason to expect z
to have all the same behaviors as Float64
unless there is also an implicit conversion from U
to Float64
. (Also, Array{Float64,0}
does support axes
and getindex
?)
Honestly, I never much liked that Number
has container semantics (that seemed to be a purely pragmatic decision to ease certain parts of the implementation in Base
but results in silly bugs where typos like for i in N
proceed without error) and broadcast
is an operation on containers. Basically I feel that 1 .+ 2
is a programmer error - I wouldn't let it slide on a PR for example.
In very generic code the one thing you do have to mentally track is which variables are scalar and which are containers, so I don't see any impediment to generic code if for example it were forbidden to write broadcast
operations on scalars only (IMO such a choice should basically only catch logic errors or bugs). But returning a 0d array in the default case seems fairly a fairly pragmatic choice, and has logical consistency with the way scalars are treated within a larger broadcast
operation.
Also note that if we support fill(2) * matrix
we still probably shouldn't support something like matrix + fill(2)
in the future (where I still believe matrix + 2
makes sense as matrix + 2I
).
@andyferris
(where I still believe
matrix + 2
makes sense asmatrix + 2I
)
This is specific to the rank-2 case, right?
The best way to evaluate this change would be to try it out. Personally, I don't find the status quo so abhorrent, so I won't be pushing for this change, and clearly it'd be a breaking v2.0 change, but it's a really easy change to make. Just steal the logic from #32122 — it really shouldn't be more than 10LOC between all the builtin broadcast implementations. Then the big question is how many LOC worth of tests and packages need to change. I'm up for changing my mind on the practicality here if the evidence weighs against it!
Here's a compelling argument: https://discourse.julialang.org/t/broadcasting-and-pairs-using/24739
In short:
p = "2".=>"two"
replace.(["123", "246"], p)
vs.
replace.(["123", "246"], "2".=>"two")
The first fails, whereas the second succeeds, but the two should be equivalent in their end result. This only happens because we've lost the 0-d container.
@mbauman, I'd be interested in trying your suggestion in https://github.com/JuliaLang/julia/issues/28866#issuecomment-496604867 to see how much work it is. I'm new to contributing to Julia -- can you point me to what I need to know to get my local clone to the point where I can make a change to the broadcast implementation, run some compile/test commands, and see what breaks? I can do that quickly, then over the next few weeks work on it as I have time.
Here you go: https://github.com/JuliaLang/julia/compare/mb/true28866
Just:
git clone https://github.com/JuliaLang/julia.git
cd julia
git checkout mb/true28866
make
make testall
Just an administrative note: this issue started out describing both broadcast's design as well as the bug in #32122 — and I wrote the commit message there before we really started hashing out broadcast's design here. So we should re-open this issue if that commit message ends up auto-closing this — it only addresses the "easy" half of this issue.
I find zeros() .+ 1 == 1
very unintuitive. I had an issue with this today. The following function:
f(x::AbstractArray; dims=:) = sum(x .+ 1; dims=dims)
Then f(x)
works for all arrays, except for zero-dimensional arrays because of this. In this example I now have to special case dims=:
.
Maybe there should be a definition for sum(::Number; dims::Colon)
.
Broadcasting should return 0-dimensional array! Support you with another using case:
julia> conj(Zygote.Fill(1.0im))
0-dimensional Array{FillArrays.Fill{ComplexF64, 0, Tuple{}}, 0}:
Fill(0.0 - 1.0im)
The conj
over filled array returns a 0-dimensional array with filled array inside!
This is cause by:
Fill
, returns a Fill
type.Is it really safe to treat all arrays as dense array type? The above code is very bad to ecosystem.
This now causing OMEinsum AD test failure: https://github.com/under-Peter/OMEinsum.jl/pull/141
Ironically, that's happening because FillArray's Fill
defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.
It'd work as you want if FillArrays would match Base's current broadcasting semantics.
Ironically, that's happening because FillArray's
Fill
defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.It'd work as you want if FillArrays would match Base's current broadcasting semantics.
It is not only the problem of the FillArrays. This implementation in Julia base (https://github.com/JuliaLang/julia/blob/938da264a2e40ba00653ae6659221f5af021c85b/base/broadcast.jl#L860) is bad:
length(axes(bc)) == 0 ? fill!(similar(bc, typeof(r)), r) : r
Two aspects can be improved
I won't be happy if conj(::Fill)
returns me a normal array.
Maybe fallback conj
, real
function on the broadcast method is a wrong decision, because broadcast is not bijective in type (different array types can go to the same scalar type).
- it should check the type of the original array and try to return the same array type,
It tries to; that's what the similar(bc, typeof(r))
part does. It's up to the broadcast implementation to specialize that.
- it should be type stable.
It is not a type instability because the length of the axes is encoded into the broadcast's type.
- it should check the type of the original array and try to return the same array type,
It tries to; that's what the
similar(bc, typeof(r))
part does. It's up to the broadcast implementation to specialize that.
- it should be type stable.
It is not a type instability because the length of the axes is encoded into the broadcast's type.
I see what you mean. So it is a wield interaction of similar
does not preserve type and broadcast
does not keep array type information for 0-dimensional array. Now I agree there is no easy solution. Hope Julia v2.0 can make the broadcast more predictable.
Hi,
I just sumbled upon this as well and just to add to the irritation here, I wrote a small discourse about my encounter https://discourse.julialang.org/t/broadcast-strange-on-0-dimensional-array/97311 - which I will resolve by not using 0-dim arrays, since for me they currently seem to not behave as I expect arrays to behave (especially in the .+
case.
I came to realize the broadcasting behavior on 0-dimensional arrays pretty late. This will mean a significant implementation change for what I'm working on. A minimal example of the issue I'm running into:
f(x::Array{<:Real}) = x .+ 1
g(y::Array{<:Real}) = nothing
a = [1]
b = fill(1)
(g∘f)(a) # works
(g∘f)(b) # breaks
This is especially problematic for GPU code because scalarization strips any allocation information present in a zero-dimensional array:
julia> x = MtlArray(fill(1f0))
0-dimensional MtlArray{Float32, 0, Private}:
1.0
julia> x .+ x
2.0f0
which now lives on the CPU.
Is it a bug, or a feature?