JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

2.0: `unique(xs; by=f)` to replace `unique(f, xs)` #50406

Open jariji opened 1 year ago

jariji commented 1 year ago

Let f(::A)::B and xs :: Vector{A}. maximum(f, xs)::B returns f's output but but unique(f, xs)::Vector{A} returns f's input. It would be better to have a single convention for what func(f, xs) does.

julia> maximum(abs, [-1, -2])
2

julia> unique(abs, [-1, -2])
2-element Vector{Int64}:
 -1
 -2

julia> sort([-1, -2]; by=abs)
2-element Vector{Int64}:
 -1
 -2

This is more like sort(xs; by=f), which I find clearer about what it is doing than the current design. So proposal:

unique(xs; by=f)
jakobjpeters commented 1 year ago

I see what you're going for here, but the existing order has a well established purpose. By making the first argument a function, one can choose to use the do-block syntax for function arguments. See also do.

aplavin commented 1 year ago

It would be better to have a single convention for what func(f, xs) does.

Yeah, also see the argmax(A)/argmax(f, A) story. The latter should've really been maximum(A, by=f)...

oscardssmith commented 1 year ago

IMO neither of these are right. The right answer is that with a lazy broadcast, you can write unique(f.(xs)) (with whatever the lazy broadcast syntax is) and get good performance.

aplavin commented 1 year ago

Yeah, if hof(f, X) means hof(map(f, X)), then the former isn't really necessary. It's already possible and efficient, requiring no new julia features: use mapview instead of map to save on allocations. Available in packages :)

julia> X = rand(10^5);

# baseline: fastest possible
julia> @btime unique($(map(x -> round(Int, x*10), X)));
  884.901 μs (10 allocations: 1.59 KiB)

# regular materializing map():
julia> @btime unique(map(x -> round(Int, x*10), $X));
  1.060 ms (12 allocations: 782.89 KiB)

julia> using FlexiMaps
# mapview:
julia> @btime unique(mapview(x -> round(Int, x*10), $X));
  1.086 ms (10 allocations: 1.59 KiB)

This doesn't contradict the OP though:

It would be better to have a single convention for what func(f, xs) does.

Maybe, this "single convention" should be "such methods aren't generally added". Still, stuff like unique(X, by=f) and maximum(X, by=f) are needed, they aren't equivalent to unique(map(f, X)).