JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.93k stars 5.49k forks source link

What should the shape of zipped shaped iterators be? #47195

Open jakobnissen opened 2 years ago

jakobnissen commented 2 years ago

Currently, a zip of HasShape{N} iterators is itself HasShape{N}:

julia> z = zip([1 2], [1 2]');

julia> Base.IteratorSize(z)
Base.HasShape{2}()

However, the documentation of IteratorSize states that:

HasShape{N}() if there is a known length plus a notion of multidimensional shape (as for an array). In this case N should give the number of dimensions, and the axes function is valid for the iterator

Despite z being HasShape, axes is not valid, contrary to the docstring.

julia> axes(z)
ERROR: DimensionMismatch: dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(2)), b has dims (Base.OneTo(2), Base.OneTo(1)), mismatch at 1

This error in the implementation means that some valid zips, like the above cannot be collected, because while it's a valid iterator, its shape is nonsensical:

julia> collect(z)
ERROR: DimensionMismatch: dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(2)), b has dims (Base.OneTo(2), Base.OneTo(1)), mismatch at 1

I can't see why this shouldn't work. And if it should, I don't see a way a user could make this work, either. At this time, Base uses zip heavily internally, for example in the implementations of map and Generator, which are documented to preserve shape.

I'm not sure what is the right way to go about this, but I propose:

mcabbott commented 2 years ago

Xref #42216... edit: mostly about map, but has links to older discussions about zip.

jakobnissen commented 2 years ago

Let's leave the map discussion for another issue. Zip and map need not behave the same, so these issues are orthogonal.

lxvm commented 10 months ago

I agree that zip should have a length when the shape is inconsistent, or simply be consistent with the shape of collecting zip.

julia> z = zip(1.0, 1:3)
zip(1.0, 1:3)

julia> collect(z)
1-element Vector{Tuple{Float64, Int64}}:
 (1.0, 1)

julia> size(z)
ERROR: DimensionMismatch: dimensions must match: a has dims (3,), must have singleton at dim 1
Stacktrace:
 [1] promote_shape
   @ Base ./indices.jl:162 [inlined]
 [2] promote_shape
   @ Base ./indices.jl:153 [inlined]
 [3] _promote_tuple_shape
   @ Base.Iterators ./iterators.jl:394 [inlined]
 [4] size(z::Base.Iterators.Zip{Tuple{Float64, UnitRange{Int64}}})
   @ Base.Iterators ./iterators.jl:390
 [5] top-level scope
   @ REPL[111]:1

When the shape and axes are consistent I could see why it is useful to to preserve the shape, although that doesn't appear to be enforced for 1d iterators.

julia> z = zip(ones(2,), ones(3,))
zip([1.0, 1.0], [1.0, 1.0, 1.0])

julia> collect(z)
2-element Vector{Tuple{Float64, Float64}}:
 (1.0, 1.0)
 (1.0, 1.0)

julia> z = zip(ones(2,), ones(3,3))
zip([1.0, 1.0], [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0])

julia> collect(z)
2-element Vector{Tuple{Float64, Float64}}:
 (1.0, 1.0)
 (1.0, 1.0)

julia> z = zip(ones(2,2), ones(2,2))
zip([1.0 1.0; 1.0 1.0], [1.0 1.0; 1.0 1.0])

julia> collect(z)
2×2 Matrix{Tuple{Float64, Float64}}:
 (1.0, 1.0)  (1.0, 1.0)
 (1.0, 1.0)  (1.0, 1.0)

julia> z = zip(ones(2,2), ones(3,3))
zip([1.0 1.0; 1.0 1.0], [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0])

julia> collect(z)
ERROR: DimensionMismatch: dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(2)), b has dims (Base.OneTo(3), Base.OneTo(3)), mismatch at 1
Stacktrace:
 [1] promote_shape
   @ ./indices.jl:178 [inlined]
 [2] _promote_tuple_shape
   @ ./iterators.jl:394 [inlined]
 [3] axes(z::Base.Iterators.Zip{Tuple{Matrix{Float64}, Matrix{Float64}}})
   @ Base.Iterators ./iterators.jl:391
 [4] _similar_shape
   @ ./array.jl:711 [inlined]
 [5] _collect(cont::UnitRange{Int64}, itr::Base.Iterators.Zip{Tuple{…}}, ::Base.HasEltype, isz::Base.HasShape{2})
   @ Base ./array.jl:765
 [6] collect(itr::Base.Iterators.Zip{Tuple{Matrix{Float64}, Matrix{Float64}}})
   @ Base ./array.jl:759
 [7] top-level scope
   @ REPL[118]:1
Some type information was truncated. Use `show(err)` to see complete types.

Since multi-collection map uses zip, it seems unhelpful to break the existing behavior, but I still think it would be nice to have size(zip(...)) == size(collect(zip(...)))