JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.62k stars 5.48k forks source link

`map` over several tuples demands equal length #42216

Closed mcabbott closed 1 year ago

mcabbott commented 3 years ago

Instead of stopping early like zip, this is an error:

julia> map(+, (1,2,3), (4,5))
ERROR: BoundsError: attempt to access Tuple{} at index [1]
Stacktrace:
 [1] getindex(t::Tuple, i::Int64)
   @ Base ./tuple.jl:29
 [2] map (repeats 3 times)
   @ ./tuple.jl:250 [inlined]
 [3] top-level scope
   @ REPL[32]:1

julia> map(+, (1,2,3), [4,5])  # what I expected
2-element Vector{Int64}:
 5
 7

Is this a bug? It seems to be present at least since 1.0. The docstring for map says

For multiple collection arguments, apply f elementwise, and stop when when any of them is exhausted.

but I'm not sure I can quote it as authoratative, since I think I wrote it.

Seelengrab commented 3 years ago

Copying from slack:

If it is, it could be made consistent by adding these:

julia> @eval Base map(f, ::Tuple{}, ::Tuple{Any}) = ()
map (generic function with 61 methods)                

julia> @eval Base map(f, ::Tuple{Any}, ::Tuple{}) = ()
map (generic function with 62 methods)                

julia> map(+, (1,2,3), (4,5))                         
(5, 7)                                                

though that won't fix the 3+ argument case. Might just need a any(isempty, tails) ? () : map(f, tails...) in the right place for that, unsure how much inference likes this.

timholy commented 3 years ago

I love that it throws an error, and I wish zip did too.

jakobnissen commented 3 years ago

For usability, I think it's great that it throws an error, and it would be nice if map did that on arrays as well. Better to be strict by default - and when you're mapping over N containers, it's presumably the intention that every element of every container participates in the mapping. It's much nicer to have map error because you violated an assumption about your data instead of tracking down an error caused by map silently only mapping some of the elements of the collection.

The docstring was changed after the release of 1.6, so haven't appeared in any released Julia versions, so I don't think it's breaking to change it back.

Arguably, it's the correct behaviour to error, since the canonical docstring says "[ ... ] by applying f to each element". If it only applies to the first elements, it doesn't apply each element.

mcabbott commented 3 years ago

For once I actually wanted the early-stopping behaviour on tuples, yesterday. The fact that apparently nobody else did, for years, does make it seem like this isn't widely used. (And it's obviously easy to roll your own.)

Agree that the stricter behaviour would avoid some surprises. But changing it for arrays (and iterators) sounds to me like it would be breaking. It just collects Base.Generator(+, [1,2,3], [4,5]), which contains zip; I'm not sure that the fact that nobody got around to updating the docstring until recently means all that much.

For arrays note also that map(+, ones(2,2), ones(2,1)) is an error, while map(+, ones(2,2), ones(2,1,1)) is a zip. (Same since 1.0, but also no docstring until recently.)

mcabbott commented 3 years ago

The present behaviour for map on Tuples is tested to give an error here: https://github.com/JuliaLang/julia/blob/master/test/tuple.jl#L265

I don't actually see a test for map on arrays / iterators stopping early. (Although I have not run all tests.) Would it be crazy to change that to demand they all have the same length, or same size even? That would certainly be simpler to think about.

I doubt that zip could change its default, but it could gain an option. Then something like map(f, iters...) = collect(Generator(splat(f), zip(iters...; strict=true))) would be a generic map which behaves more like the Tuple case does now.

stevengj commented 3 years ago

For zip, see the discussion at #20499

o314 commented 3 years ago

I love that it throws an error, and I wish zip did too.

python, ruby, haskell, rust ... do no error when lengths mismatch on zip call, so i think this is not a good reco.

timholy commented 3 years ago

zip definitely can't change its default, at least not until Julia 2.0. Gaining a strict=true option seems like the way to go here.

oxinabox commented 3 years ago

For map we might want to introduce a map_shortest that does the truncating behavior. I likewise wish that zip errored, and we had a zip_shortest which has the current behavior.

mcabbott commented 3 years ago

I agree that two versions of zip would be nice. Doesn't matter much what they are called.

But the question of this issue is map, whose use of zip as an implementation detail. On tuples it is strict, but on other things it stops early:

julia> map(+, 1:2, 3:5000)
2-element Vector{Int64}:
 4
 6

That difference of behaviour seems really surprising to me. Whichever you prefer, they should surely agree. To fix it, we can change map(f, Tuple...) to stop early since that's an error now. But it sounds like nobody wants that. The other possible fix is to change map(f, iters...) not to stop early. Can we do this? After running PkgEval? Or is it too obviously breaking?

Note also that map on multi-dimensional arrays doesn't stop early. More surprisingly, zip also doesn't:

julia> map(+, [1 2], [3 4; 5 6])
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(2)), b has dims (Base.OneTo(2), Base.OneTo(2)), mismatch at 1")

julia> collect(zip([1 2], [3 4; 5 6]))
ERROR: DimensionMismatch

If we add an option to zip like zip(...; strict=true), we could surely make a similar option work for map too. But it still shouldn't have different defaults on different argument types.

mcabbott commented 2 years ago

It seems that map on two vectors only gained this early stopping behaviour in Julia 1.5. Before that, it was an error -- although the case of two iterators still allowed it:

julia> map(+, 1:2, 3:5000)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2),), b has dims (Base.OneTo(4998),), mismatch at 1")

julia> map(+, 1:2, (3,4,5,6,7))
2-element Array{Int64,1}:
 4
 6

julia> VERSION
v"1.4.2"

Is that evidence that we could change it to be an error?

JeffBezanson commented 2 years ago

I have always been a fan of allowing different lengths. But here's a point from triage: it might make sense to check up front and require length/shape matches for anything that has a definitive shape like an array, but for iterators allow mismatches. The reasoning is that with iterators, you have no choice but to start mapping, and only later find out whether the lengths matched, and it seems silly to "throw away" all the work.

JeffBezanson commented 2 years ago

Also from triage: since we can't make this fully consistent now in a non-breaking way, we should probably change the tuple case to match other cases of map and zip like the docs say.

StefanKarpinski commented 2 years ago

Slight correction (I think): we can make it consistent, but only in the direction of making it more permissive. We already allow mismatched lengths for arrays and iterators in both map and zip. We also allow mismatched tuple lengths in zip but require matching tuple lengths in map only, i.e. this is an error: map(tuple, (1, 2), (3, 4, 5)). And an inscrutable, probably unintentional one at that. We can make this consistent across the board by just fixing map with tuples to allow different lengths.

If we want a path to stricter length checking, we can do that by adding a strict::Bool keyword option to both map and zip that forces lengths to be the same. In Julia 1.x, the default would have to be false everywhere; in Julia 2.0, we could change the default to be true when all of the arguments have known length and false when some of the arguments had infinite or unknown length. I think that if strict is explicitly passed as true we should error if the lengths don't match even if we can only discover that after the zipping/mapping is done.