`Broadcast.broadcast_shape` for BlockedUnitRanges fails inference

JuliaArrays / BlockArrays.jl

BlockArrays for Julia

http://juliaarrays.github.io/BlockArrays.jl/

Other

195 stars 29 forks source link

`Broadcast.broadcast_shape` for BlockedUnitRanges fails inference #310

Open charleskawczynski opened 1 year ago

charleskawczynski commented 1 year ago

MWE:

using BlockArrays
shape1 = (BlockArrays._BlockedUnitRange((2,)),);
shape2 = (BlockArrays._BlockedUnitRange((2,)),);
Base.Broadcast._bcs(shape1, shape2)
# @code_warntype Base.Broadcast._bcs1(shape1[1], shape2[1]) # lower level, but internals
@code_warntype Base.Broadcast.broadcast_shape(shape1, shape2)

jishnub commented 1 year ago

This comes from https://github.com/JuliaArrays/BlockArrays.jl/blob/ec8c7f554bf57f06b85d640c6fa7dcfd5fa9b5c3/src/blockbroadcast.jl#L37 Perhaps we may remove the special-casing, as the performance impact in the 1-term union will be minimal

charleskawczynski commented 1 year ago

Yeah, I tried fixing this. I'll open the PR for convenience (#312). First, I thought that the core issue was that union is used in sortedunion, and union does not preserve tuple types:

julia> union((1,), (2,))
2-element Vector{Int64}:
 1
 2

However, even after fixing that in the PR (which borrows some functions in TupleTools.jl and defines union on Tuples to return a Tuple), the result is still type unstable because the tuple length depends on the values in the tuple:

So, while this could fix the type preservation, it won't actually fix the type instability. If you have ideas about how to make Broadcast.broadcast_shape type stable / inferrable, that'd be great

jishnub commented 1 year ago

I think it's better for this to return a vector, as the non-trivial branch does. Since unique works on values, the length can't be known at compile time. I wonder if we may just remove the if-else here? I haven't checked if this changes the results

charleskawczynski commented 1 year ago

Yeah, that does fix it. I guess it's worth it. The general case of calling combine_blockaxes is either going to be type unstable in the case of Tuples (leading to inference triggers), or allocating heap allocating from union(a,b). Right now, it's a mixture, so it allocates and it's type-unstable.

charleskawczynski commented 1 year ago

I'll update the PR

charleskawczynski commented 1 year ago

Ok, the PR is updated.

charleskawczynski commented 1 year ago

Thanks for the tip @jishnub!

charleskawczynski commented 1 year ago

Actually, I prefer #313 if that's okay 🙂

charleskawczynski commented 1 year ago

So, it seems that length(b) == 1 ? a is needed for upstream tests to pass.

I updated https://github.com/JuliaArrays/BlockArrays.jl/pull/313, and basically specialized it so that the tests pass, but I have a feeling that I'm introducing inconsistent behavior between blocklasts()::Tuple and blocklasts()::Vector cases.

I think one potential solution would be to put the length of BlockedUnitRange in the type space:

struct BlockedUnitRange{CS,L} <: AbstractUnitRange{Int}
    first::Int
    lasts::CS
    global _BlockedUnitRange(f, cs::CS) where CS = new{CS, len(f, cs)}(f, cs)
end

len(f, l) = isempty(l) ? 0 : Integer(last(l)-f+1)
...
length(a::BlockedUnitRange{CS, L}) where {CS, L} = L

which will make the call to length (in combine_blockaxes) compile-time known, and remove the type instability. However, that may cause other issues. And I'm now seeing that things like DefaultBlockAxis may make this change more intrusive.

Thoughts? jishnub

jishnub commented 1 year ago

I wonder if a simple workaround might be to convert the return value of axistype to a Vector?

dlfivefifty commented 1 year ago

I wonder if a simple workaround might be to convert the return value of axistype to a Vector?

Absolutely not: one of the usages downstream are infinite dimensional block array, and even without that we want to support allocation-free axes when the block sizes are Fill.

Note this really is an issue inherited with Base's broadcasting: they never should have supported degenerate sizes like randn(5,1) .* randn(5,6), and restricted broadcasting to randn(5) .* randn(5,6). Similarly only support randn(5,6) .* transpose(randn(6)).

We could potentially disallow degenerate broadcasting for blocked arrays. We'd have to see what tests fail downstream.

charleskawczynski commented 1 year ago

Agreed, the main reason for fixing this issue is to avoid allocations. So ideally we have a solution that is fully inferred and stack allocated.