JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.48k stars 5.46k forks source link

disentangle `map` and `collect` generic functions #48510

Open goretkin opened 1 year ago

goretkin commented 1 year ago

When I originally opened https://github.com/JuliaLang/julia/issues/39628 , I realized I should have made two issues, and didn't This is the non-doc portion of that issue.

Where possible, it is desirable to have properties such as

map(identity, c) == c

See https://github.com/JuliaLang/julia/pull/38150#issuecomment-726304995 . As that thread indicates, that property doesn't currently hold, because map is conflated with collect. To relate the behavior of map and collect, I expect the following property

map(f, collect(c)) == collect(map(f, c))

One example where the property doesn't hold is with Generators:

julia> c = (x for x in 1:4)
Base.Generator{UnitRange{Int64},typeof(identity)}(identity, 1:4)

julia> map(identity, c)
4-element Array{Int64,1}:
 1
 2
 3
 4

That property could hold with the following definitions:

Base.:∘(::(typeof(identity)), f) = f
Base.map(f, g::Base.Generator) = Base.Generator(f ∘ g.f, g.iter)

This will break code that was using map for its collect behavior. Some examples (with some false positives from splatting arguments from a generator into map) can be found with the regex map\([^,]*,\s*\(.*for.*in.

@jakobnissen recently made similar comments on Slack

(For v2.0)

N5N3 commented 1 year ago

I think we can do this for Iterators.map before 2.0?

goretkin commented 1 year ago

That search link no longer works: here is an updated link: https://juliahub.com/ui/Search?q=map%5C%28%5B%5E,%5D%2a,%5Cs%2a%5C%28.%2afor.%2ain&type=code&r=true

An example true positive of the search: https://github.com/zgornel/EmbeddingsAnalysis.jl/blob/e045d5c9169a29a68d66aad15297348de1863e8e/test/vocab_reduction.jl#L5

The current behavior (illustrated below) shows why changing the behavior of map on Generator to not also collect can be problematic:

julia> gen1 = (fill(i, (3, )) for i = 1:4)
Base.Generator{UnitRange{Int64}, var"#21#22"}(var"#21#22"(), 1:4)

julia> gen2 = (sum(vec) for vec in gen1)
Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}(var"#23#24"(), Base.Generator{UnitRange{Int64}, var"#21#22"}(var"#21#22"(), 1:4))

julia> arr2 = map(sum, gen1)
4-element Vector{Int64}:
  3
  6
  9
 12

julia> @show typeof(arr2) typeof(gen2)
typeof(arr2) = Vector{Int64}
typeof(gen2) = Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}
Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}

julia> eltype(arr2)
Int64

julia> eltype(gen2)
Any
uniment commented 1 year ago

An example true positive of the search:

It seems like that specific example could be fixed by introducing a method of permutedims that takes Generators? After that, the ./= operator works with generators.

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types? If so, then eltype could be made to work on generators.

If those two issues could be addressed, then is there any code that this fix would break?

jariji commented 1 year ago

@uniment said

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types?

Core.Compiler.return_type is undocumented and not totally reliable, but that's more or less what it does.

uniment commented 1 year ago

Seems to work:

julia> Base.eltype(::Type{Base.Generator{I,F}}) where {I,F} =
           Core.Compiler.return_type(Tuple{F,eltype(I)})

julia> @btime eltype(i/j for i=1:4, j=1:4)
  1.000 ns (0 allocations: 0 bytes)
Float64

julia> @btime eltype(ch+1 for ch='a':'z')
  1.000 ns (0 allocations: 0 bytes)
Char

There's a discrepancy though, which can be fixed by special-casing Tuple:

julia> eltype(x+1 for x=(2,2.0))
Any

julia> Base.eltype(::Type{Base.Generator{T,F}}) where {T<:Tuple,F} =
           typejoin(map(P->Core.Compiler.return_type(Tuple{F,P}),Tuple(T.parameters))...)

julia> @btime eltype(x+1 for x=(2,2.0))
  1.000 ns (0 allocations: 0 bytes)
Real

(I had to convert T.parameters in a Tuple because calling map on a Core.svec currently collects into a Vector.)

Dunno if a specialization would be helpful or not:

julia> Base.eltype(::Type{Base.Generator{T,F}}) where {T<:NTuple{N,E},F} where {N,E} =
           Core.Compiler.return_type(Tuple{F,E})
uniment commented 1 year ago

Making adjoint work with generators:

julia> Base.reverse(itr::Iterators.ProductIterator) = Iterators.ProductIterator(reverse(itr.iterators))

julia> Base.adjoint(g::Base.Generator{I,F}) where {I<:Iterators.ProductIterator{Tuple{Any,Any}},F} =
           Base.Generator(adjoint ∘ g.f ∘ reverse, reverse(g.iter))

julia> Base.adjoint(g::Base.Generator) = # for vectors
           Base.Generator(adjoint ∘ g.f ∘ (((_,x),)->x), Iterators.ProductIterator((1:1,g.iter)))

Test:

julia> collect((exp(i*im) for i=1:3)') == collect((exp(i*im) for i=1:3))'
true

julia> collect((exp(i+j*im) for i=1:3, j=1:4)') == collect((exp(i+j*im) for i=1:3, j=1:4))'
true
uniment commented 1 year ago

Making permutedims work with generators:

julia> Base.permutedims(g::Base.Generator{<:Iterators.ProductIterator}, perm) =
           Base.Generator(g.f ∘ (args -> map(p -> args[p], perm)), Iterators.ProductIterator(map(p -> g.iter.iterators[p], perm)))

julia> Base.permutedims(g::Base.Generator{<:Iterators.ProductIterator}) = permutedims(g, (2, 1))

julia> Base.permutedims(g::Base.Generator) = permutedims(Base.Generator(g.f ∘ ((x,_),)->x, Iterators.ProductIterator((g.iter, 1:1))))

Test:

julia> collect(permutedims(x for x=1:3)) == permutedims(collect(x for x=1:3))
true

julia> collect(permutedims(i+j for i=1:3, j=1:4)) == permutedims(collect(i+j for i=1:3, j=1:4))
true

julia> collect(permutedims((i+j+k for i=1:3, j=1:4, k=1:5), (3,1,2))) == permutedims(collect(i+j+k for i=1:3, j=1:4, k=1:5), (3,1,2))
true
goretkin commented 1 year ago

An example true positive of the search:

It seems like that specific example could be fixed by introducing a method of permutedims that takes Generators? After that, the ./= operator works with generators.

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types? If so, then eltype could be made to work on generators.

If those two issues could be addressed, then is there any code that this fix would break?

This has been discussed before, but I'll try to summarize here. There is a considerable extent to which the functional/behavioral semantics of Julia code should not depend on type inference, which exists almost exclusively in service of generating efficient code.

There has been a deliberate choice not to rely on Core.Compiler.return_type in many situations that are tempting to use it. I believe map does use it, however.

julia> Core.Compiler.return_type(sin, Tuple{Int64})
Float64

julia> Core.Compiler.return_type(sin, Tuple{Int64})
Float64

julia> sum(Vector{Float64}())
0.0

julia> map(sin, 1:0)
Float64[]

julia> sum(Base.Generator(sin, 1:0))
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
[...]

julia> sum(Base.Generator(identity, 1:0))
0

collect is actually fairly complicated, and I think going down the approach of aligning the interface to Base.Generator to look like the interface to the return value of collect (e.g. Vector{Any}) is more involved than it may appear.

uniment commented 1 year ago

There has been a deliberate choice not to rely on Core.Compiler.return_type in many situations that are tempting to use it. I believe map does use it, however.

An argument can be made that Generators are exceptional enough to be granted special consideration. In #34674 it is said,

generators are said to be the lazy variant of map

this is consistent with that view.

I think going down the approach of aligning the interface to Base.Generator to look like the interface to the return value of collect (e.g. Vector{Any}) is more involved than it may appear.

"The journey of a thousand miles begins with one step." - Lao Tzu