JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.79k stars 5.49k forks source link

Define the output order from `map` #50857

Open CameronBieganek opened 1 year ago

CameronBieganek commented 1 year ago

Currently the map docstring makes no assertion about the order of the output collection from map. So, map(x -> 2x, 1:2) == [4, 2] is conformant with the map docstring. Thus, strictly speaking, no one should ever use map unless they are going to perform a reduction (with an associative operator) on the output. Here is a proposal for an updated docstring that defines the output order:

"""
    map(f, x...)

Transform iterable `x` by applying `f` to each element. The order of the output matches the order of the input.  
In other words, the following expression is always true:

all(zip(map(f, xs), xs)) do (fx, x)
    fx == f(x)
end

For multiple collection arguments, apply `f` elementwise, and stop when when any of them is exhausted.
"""

Note that I've also switched the terminology from "collection" to "iterable", since map works on arbitrary iterable objects which are not necessarily collections. For example, length throws a MethodError when called on an iterator like

itr = (sin(x) for x in 1:10 if sin(x) > 0)

and the manual implies that length must be implemented for an object to be considered a collection.

The docstring for zip is also a bit vague, so in order to adopt the above change to the map docstring, we would also need to clarify the zip docstring.

Seelengrab commented 1 year ago

We could be even more precise, by matching the definition here; in lay terms, applying a function while preserving the structure of the object we're mapping over (as well as respecting composition of mapped functions).

The only downside is that we can't use "functor" for that, because we've chosen that to mean a callable struct instead..

CameronBieganek commented 1 year ago

Well, map doesn't actually adhere to the identity axiom:

julia> map(identity, Iterators.take(1:3, 2))
2-element Vector{Int64}:
 1
 2

Also, as far as I can tell, the functor rules don't actually imply anything about the order of the output, except in the identity case.

bvdmitri commented 1 year ago
the following expression is always true:

all(zip(map(f, xs), xs)) do (fx, x)
    fx == f(x)
end

this is a bit misleading as it is not always true in general. For this to hold the collection must be non stateful, f must be pure. And it is actually may be a non-Bool value as well even for pure functions:

julia> f(x) = [ 1, 2, 3, x, missing ]
f (generic function with 1 method)

julia> xs = [1, 2]
2-element Vector{Int64}:
 1
 2

julia> all(zip(map(f, xs), xs)) do (fx, x)
           fx == f(x)
       end
missing

And yeah, the order of zip is actually not guaranteed either.

CameronBieganek commented 1 year ago

I thought about stateful iterators a while ago, but I finally got around to opening the issue today. So I forgot to add a copy to handle stateful iterators. Regarding non-Bool values, I suppose the comparison should be with isequal. So, combining those two changes, we have

all(zip(map(f, copy(xs)), xs)) do (fx, x)
    isequal(fx, f(x))
end

Impure functions seem harder to handle. The following works, but it feels like it works for the wrong reasons:

julia> f(x) = push!(x, 10)
f (generic function with 1 method)

julia> xs = [[1, 2], [3, 4]]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4]

julia> all(zip(map(f, copy(xs)), xs)) do (fx, x)
           a = f(x)
           @show a
           @show fx
           isequal(fx, a)
       end
a = [1, 2, 10, 10]
fx = [1, 2, 10, 10]
a = [3, 4, 10, 10]
fx = [3, 4, 10, 10]
true

There's probably a counter-example with an impure function where the all expression returns false...

At any rate, the all expression is starting to look a bit too unwieldy to add to the docstring. Perhaps there's a better way to describe the output order in prose. The best I can think of at the moment is something like this:

The order of the output matches the order of the input. For example,
the first element of the output is `f(first(x))` and the last element of the
output is `f(last(x))`.

And yeah, the order of zip is actually not guaranteed either.

I think that comment is referring to the fact that, for zip(itr1, itr2), it is undefined whether the $i$-th tuple is evaluated by first iterating itr1 and then iterating itr2 or by first iterating itr2 and then iterating itr1. In other words, the evaluation of the $i$-th tuple could be done either like this:

a, state1 = iterate(itr1, state1)
b, state2 = iterate(itr2, state2)
return (a, b)

or like this:

b, state2 = iterate(itr2, state2)
a, state1 = iterate(itr1, state1)
return (a, b)

The order of the output tuples, on the other hand, is most definitely intended to correspond to the order of the input tuples, except for weird cases with aliased, stateful iterators (e.g. itr = Iterators.Stateful(1:5); zip(itr, itr)). In other words, the first tuple should normally be (first(itr1), first(itr2)).

CameronBieganek commented 1 year ago

I should reiterate the main message of my original post:

Without a guarantee in the docstring on the order of the output, no one should ever use map unless they are going to perform a reduction (with an associative operator) on the output.

bvdmitri commented 1 year ago

I agree with you issue, I just feel it unnecessary to come up with a "code explanation", which has many other implicit assumptions. You've explained it pretty well in one sentence "The order of the output matches the order of the input.". That should be enough?

jishnub commented 1 year ago

Since you're referring to the order being undefined, do you mean that the reduction operator should be commutative?

CameronBieganek commented 1 year ago

Yeah, I guess the operator needs to be both associative and commutative to be able to arbitrarily rearrange the order of the sequence without changing the value of the reduction. But that's not part of the proposed docstring. That was just me complaining about the vagueness of the current map docstring.

nsajko commented 1 year ago

Related observation: the foreach doc string says:

foreach should be used instead of map when the results of f are not needed, for example in foreach(println, array).

This kind of (but not really, perhaps) implies the expected output order of map.

It was added all the way back in 71fc3b3ac6413d369d954e286835192ccc3784b3, together with the NEWS entry for foreach.