Open CameronBieganek opened 1 year ago
We could be even more precise, by matching the definition here; in lay terms, applying a function while preserving the structure of the object we're mapping over (as well as respecting composition of mapped functions).
The only downside is that we can't use "functor" for that, because we've chosen that to mean a callable struct instead..
Well, map
doesn't actually adhere to the identity axiom:
julia> map(identity, Iterators.take(1:3, 2))
2-element Vector{Int64}:
1
2
Also, as far as I can tell, the functor rules don't actually imply anything about the order of the output, except in the identity
case.
the following expression is always true:
all(zip(map(f, xs), xs)) do (fx, x)
fx == f(x)
end
this is a bit misleading as it is not always true in general. For this to hold the collection must be non stateful, f
must be pure. And it is actually may be a non-Bool value as well even for pure functions:
julia> f(x) = [ 1, 2, 3, x, missing ]
f (generic function with 1 method)
julia> xs = [1, 2]
2-element Vector{Int64}:
1
2
julia> all(zip(map(f, xs), xs)) do (fx, x)
fx == f(x)
end
missing
And yeah, the order of zip
is actually not guaranteed either.
I thought about stateful iterators a while ago, but I finally got around to opening the issue today. So I forgot to add a copy
to handle stateful iterators. Regarding non-Bool values, I suppose the comparison should be with isequal
. So, combining those two changes, we have
all(zip(map(f, copy(xs)), xs)) do (fx, x)
isequal(fx, f(x))
end
Impure functions seem harder to handle. The following works, but it feels like it works for the wrong reasons:
julia> f(x) = push!(x, 10)
f (generic function with 1 method)
julia> xs = [[1, 2], [3, 4]]
2-element Vector{Vector{Int64}}:
[1, 2]
[3, 4]
julia> all(zip(map(f, copy(xs)), xs)) do (fx, x)
a = f(x)
@show a
@show fx
isequal(fx, a)
end
a = [1, 2, 10, 10]
fx = [1, 2, 10, 10]
a = [3, 4, 10, 10]
fx = [3, 4, 10, 10]
true
There's probably a counter-example with an impure function where the all
expression returns false
...
At any rate, the all
expression is starting to look a bit too unwieldy to add to the docstring. Perhaps there's a better way to describe the output order in prose. The best I can think of at the moment is something like this:
The order of the output matches the order of the input. For example,
the first element of the output is `f(first(x))` and the last element of the
output is `f(last(x))`.
And yeah, the order of
zip
is actually not guaranteed either.
I think that comment is referring to the fact that, for zip(itr1, itr2)
, it is undefined whether the $i$-th tuple is evaluated by first iterating itr1
and then iterating itr2
or by first iterating itr2
and then iterating itr1
. In other words, the evaluation of the $i$-th tuple could be done either like this:
a, state1 = iterate(itr1, state1)
b, state2 = iterate(itr2, state2)
return (a, b)
or like this:
b, state2 = iterate(itr2, state2)
a, state1 = iterate(itr1, state1)
return (a, b)
The order of the output tuples, on the other hand, is most definitely intended to correspond to the order of the input tuples, except for weird cases with aliased, stateful iterators (e.g. itr = Iterators.Stateful(1:5); zip(itr, itr)
). In other words, the first tuple should normally be (first(itr1), first(itr2))
.
I should reiterate the main message of my original post:
Without a guarantee in the docstring on the order of the output, no one should ever use map
unless they are going to perform a reduction (with an associative operator) on the output.
I agree with you issue, I just feel it unnecessary to come up with a "code explanation", which has many other implicit assumptions. You've explained it pretty well in one sentence "The order of the output matches the order of the input.". That should be enough?
Since you're referring to the order being undefined, do you mean that the reduction operator should be commutative?
Yeah, I guess the operator needs to be both associative and commutative to be able to arbitrarily rearrange the order of the sequence without changing the value of the reduction. But that's not part of the proposed docstring. That was just me complaining about the vagueness of the current map
docstring.
Related observation: the foreach
doc string says:
foreach
should be used instead ofmap
when the results off
are not needed, for example inforeach(println, array)
.
This kind of (but not really, perhaps) implies the expected output order of map
.
It was added all the way back in 71fc3b3ac6413d369d954e286835192ccc3784b3, together with the NEWS entry for foreach
.
Currently the
map
docstring makes no assertion about the order of the output collection frommap
. So,map(x -> 2x, 1:2) == [4, 2]
is conformant with themap
docstring. Thus, strictly speaking, no one should ever usemap
unless they are going to perform a reduction (with an associative operator) on the output. Here is a proposal for an updated docstring that defines the output order:Note that I've also switched the terminology from "collection" to "iterable", since
map
works on arbitrary iterable objects which are not necessarily collections. For example,length
throws aMethodError
when called on an iterator likeand the manual implies that
length
must be implemented for an object to be considered a collection.The docstring for
zip
is also a bit vague, so in order to adopt the above change to themap
docstring, we would also need to clarify thezip
docstring.