JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.88k stars 5.49k forks source link

map with different lengths throws DimensionMismatch, foreach does not #33340

Closed marekdedic closed 4 years ago

marekdedic commented 5 years ago

The behaviour of using map and foreach on arrays of different lengths is inconsistent:

julia> map((x, y) -> typeof(y), [1, 2, 3], [4, 5])
ERROR: DimensionMismatch("dimensions must match")
Stacktrace:
 [1] promote_shape at ./indices.jl:154 [inlined]
 [2] _promote_shape at ./iterators.jl:317 [inlined]
 [3] axes at ./iterators.jl:316 [inlined]
 [4] _array_for at ./array.jl:598 [inlined]
 [5] collect(::Base.Generator{Base.Iterators.Zip{Tuple{Array{Int64,1},Array{Int64,1}}},getfield(Base, Symbol("##3#4")){getfield(Main, Symbol("##13#14"))}}) at ./array.jl:611
 [6] map(::Function, ::Array{Int64,1}, ::Array{Int64,1}) at ./abstractarray.jl:2155
 [7] top-level scope at REPL[6]:1

julia> foreach((x, y) -> println(typeof(y)), [1, 2, 3], [4, 5])
Int64
Int64

As far as i can tell, foreach takes the smallest length of an array passed to it and iterates over that, whereas map just throws a DimensionMismatch.

Probably related to #29523, #13361 and #30389

tbonza commented 5 years ago

Trying to understand the problem you're having a bit more. When looking at the foreach docs, it mentions that it should be used instead of map when the results of f are not needed. Would it make sense for map to enforce correct dimensions and foreach to be more permissive by design?

 foreach(f, c...) -> Nothing

  Call function f on each element of iterable c. For multiple iterable
  arguments, f is called elementwise. foreach should be used instead of map
  when the results of f are not needed, for example in foreach(println,

Using slightly different example:

map((x,y) -> x + y, [1,2,3], [4,5,6])
3-element Array{Int64,1}:
 5
 7
 9

We can see foreach not return any output because I didn't call it explicitly.

foreach((x,y) -> x + y, [1,2,3], [4,5,6])

Would you agree that foreach is intentionally more permissive than map?

marekdedic commented 5 years ago

Would it make sense for map to enforce correct dimensions and foreach to be more permissive by design?

I don't see a reason why it would... Do you?

tbonza commented 5 years ago

One example would be if someone needs to use a jagged array (Multi-dimensional and Jagged Arrays)

marekdedic commented 5 years ago

Hmm, I still don't see a situation where you'd want map to fail but foreach to silently drop any members not in all arrays

IMHO foreach should throw a DimensionMismatch as well. If you want the current behaviour, you can fairly easily do that yourself, but more probably then not, when you're passing arrays of different length, it's by a mistake.

tbonza commented 5 years ago

Yup, the 3 from the x array in your example is getting dropped isn't it.

julia> foreach((x, y) -> println(typeof(x)), [1, 2, 3], [4, 5])
Int64
Int64

I agree that implicitly dropping values would be a frustrating debug session. Thanks for explaining your perspective :)

mschauer commented 5 years ago

Fix at https://github.com/JuliaLang/julia/pull/29927

mschauer commented 5 years ago

Explanation: Both map and foreach for abstract arrays are zip based. zip truncates to the shortest length and map and foreach should then follow that out of consistency. Just that collect for uneven zips fails because of a faulty length method, for which #29927 is the fix.

Map for iterators is Generator based and already has the expected behaviour, so you can do:

map(*,Iterators.cycle([false,true]), rand(1:2,100))
100-element Array{Int64,1}:
 0
 2
 0
 2
 ⋮
 1
 0
mschauer commented 4 years ago

Fixed by #29927:

julia> map((x, y) -> typeof(y), [1, 2, 3], [4, 5])
2-element Array{DataType,1}:
 Int64
 Int64