Closed mcabbott closed 1 year ago
Copying from slack:
If it is, it could be made consistent by adding these:
julia> @eval Base map(f, ::Tuple{}, ::Tuple{Any}) = ()
map (generic function with 61 methods)
julia> @eval Base map(f, ::Tuple{Any}, ::Tuple{}) = ()
map (generic function with 62 methods)
julia> map(+, (1,2,3), (4,5))
(5, 7)
though that won't fix the 3+ argument case. Might just need a any(isempty, tails) ? () : map(f, tails...)
in the right place for that, unsure how much inference likes this.
I love that it throws an error, and I wish zip
did too.
For usability, I think it's great that it throws an error, and it would be nice if map
did that on arrays as well. Better to be strict by default - and when you're mapping over N containers, it's presumably the intention that every element of every container participates in the mapping. It's much nicer to have map error because you violated an assumption about your data instead of tracking down an error caused by map silently only mapping some of the elements of the collection.
The docstring was changed after the release of 1.6, so haven't appeared in any released Julia versions, so I don't think it's breaking to change it back.
Arguably, it's the correct behaviour to error, since the canonical docstring says "[ ... ] by applying f
to each element". If it only applies to the first elements, it doesn't apply each element.
For once I actually wanted the early-stopping behaviour on tuples, yesterday. The fact that apparently nobody else did, for years, does make it seem like this isn't widely used. (And it's obviously easy to roll your own.)
Agree that the stricter behaviour would avoid some surprises. But changing it for arrays (and iterators) sounds to me like it would be breaking. It just collects Base.Generator(+, [1,2,3], [4,5])
, which contains zip
; I'm not sure that the fact that nobody got around to updating the docstring until recently means all that much.
For arrays note also that map(+, ones(2,2), ones(2,1))
is an error, while map(+, ones(2,2), ones(2,1,1))
is a zip. (Same since 1.0, but also no docstring until recently.)
The present behaviour for map
on Tuples is tested to give an error here:
https://github.com/JuliaLang/julia/blob/master/test/tuple.jl#L265
I don't actually see a test for map
on arrays / iterators stopping early. (Although I have not run all tests.) Would it be crazy to change that to demand they all have the same length, or same size even? That would certainly be simpler to think about.
I doubt that zip
could change its default, but it could gain an option. Then something like map(f, iters...) = collect(Generator(splat(f), zip(iters...; strict=true)))
would be a generic map
which behaves more like the Tuple case does now.
For zip
, see the discussion at #20499
I love that it throws an error, and I wish zip did too.
python, ruby, haskell, rust ... do no error when lengths mismatch on zip call, so i think this is not a good reco.
zip
definitely can't change its default, at least not until Julia 2.0. Gaining a strict=true
option seems like the way to go here.
For map
we might want to introduce a map_shortest
that does the truncating behavior.
I likewise wish that zip
errored, and we had a zip_shortest
which has the current behavior.
I agree that two versions of zip
would be nice. Doesn't matter much what they are called.
But the question of this issue is map
, whose use of zip
as an implementation detail. On tuples it is strict, but on other things it stops early:
julia> map(+, 1:2, 3:5000)
2-element Vector{Int64}:
4
6
That difference of behaviour seems really surprising to me. Whichever you prefer, they should surely agree. To fix it, we can change map(f, Tuple...)
to stop early since that's an error now. But it sounds like nobody wants that. The other possible fix is to change map(f, iters...)
not to stop early. Can we do this? After running PkgEval? Or is it too obviously breaking?
Note also that map
on multi-dimensional arrays doesn't stop early. More surprisingly, zip
also doesn't:
julia> map(+, [1 2], [3 4; 5 6])
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(2)), b has dims (Base.OneTo(2), Base.OneTo(2)), mismatch at 1")
julia> collect(zip([1 2], [3 4; 5 6]))
ERROR: DimensionMismatch
If we add an option to zip like zip(...; strict=true)
, we could surely make a similar option work for map
too. But it still shouldn't have different defaults on different argument types.
It seems that map
on two vectors only gained this early stopping behaviour in Julia 1.5. Before that, it was an error -- although the case of two iterators still allowed it:
julia> map(+, 1:2, 3:5000)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2),), b has dims (Base.OneTo(4998),), mismatch at 1")
julia> map(+, 1:2, (3,4,5,6,7))
2-element Array{Int64,1}:
4
6
julia> VERSION
v"1.4.2"
Is that evidence that we could change it to be an error?
I have always been a fan of allowing different lengths. But here's a point from triage: it might make sense to check up front and require length/shape matches for anything that has a definitive shape like an array, but for iterators allow mismatches. The reasoning is that with iterators, you have no choice but to start mapping, and only later find out whether the lengths matched, and it seems silly to "throw away" all the work.
Also from triage: since we can't make this fully consistent now in a non-breaking way, we should probably change the tuple case to match other cases of map and zip like the docs say.
Slight correction (I think): we can make it consistent, but only in the direction of making it more permissive. We already allow mismatched lengths for arrays and iterators in both map and zip. We also allow mismatched tuple lengths in zip but require matching tuple lengths in map only, i.e. this is an error: map(tuple, (1, 2), (3, 4, 5))
. And an inscrutable, probably unintentional one at that. We can make this consistent across the board by just fixing map
with tuples to allow different lengths.
If we want a path to stricter length checking, we can do that by adding a strict::Bool
keyword option to both map
and zip
that forces lengths to be the same. In Julia 1.x, the default would have to be false everywhere; in Julia 2.0, we could change the default to be true when all of the arguments have known length and false when some of the arguments had infinite or unknown length. I think that if strict
is explicitly passed as true we should error if the lengths don't match even if we can only discover that after the zipping/mapping is done.
Instead of stopping early like
zip
, this is an error:Is this a bug? It seems to be present at least since 1.0. The docstring for
map
saysbut I'm not sure I can quote it as authoratative, since I think I wrote it.