JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.54k stars 5.47k forks source link

`argmax` behaves differently from other higher-order functions #48502

Open timholy opened 1 year ago

timholy commented 1 year ago

argmax has two meanings:

Both of these are useful and well-defined operations. But they should be considered with respect to Julia's overall consistency: it's worth asking how argmax(f, itr) compares to other such functions that accept a "mapping" function f as an input. For example, sum(f, itr) computes sum(f(x) for x in itr), maximum(f, itr) computes maximum(f(x) for x in itr), and so on. One simple consequence is that these functions have a very natural property: sum(identity, itr) == sum(itr).

So, is argmax(identity, a) == argmax(a)?

julia> argmax(identity, [10, 30, 20])
30

julia> argmax([10, 30, 20])
2

Nope. How about "is argmax(f, itr) equivalent to argmax(f(x) for x in itr)"?

julia> a = [1, 5, NaN, 3]
4-element Vector{Float64}:
   1.0
   5.0
 NaN
   3.0

julia> f(x) =  isnan(x) ? oftype(x, -Inf) : x   # use a sentinel value for NaN

julia> argmax(f, a)
5.0

Again nope: if it did, it would return the index i such that a[i] == maximum(f, a).

Personally, I think this is a footgun. If we ever decide to go to Julia 2.0, I think we should reconsider whether these two methods belong to the same function.

One potential solution is that we could have argmax(itr) become synonymous with argmax(identity, itr) and embrace the newer meaning of argmax. Then, noting that findmax(f, a) does act in a manner consistent with other mapping functions, we could just eliminate the original meaning of argmax, as argmax seems to sometimes be used as a substitute for findmax(A)[2]. But I'm less fond of having to remember whether the index is the first or second output; hence we could switch to having it return a NamedTuple (we might be able to do that even now).

oscardssmith commented 1 year ago

IMO, the correct 2.0 solution is getting rid of all the other ones like sum(f, x) which only exist because we don't currently have a good way of doing reductions generically.

timholy commented 1 year ago

AFAICT it's just a specialization bottleneck. We could fix that today, but the cost is longer compile times.

timholy commented 1 year ago

But that wouldn't change the fact that argmax means two different things. Are you saying that if we got rid of the others, then the disconnect isn't so bad? It's certainly worse because of the analogy to other higher-order functions, but I'm not sure it's great even if we do follow your proposal.

jakobnissen commented 1 year ago

In my opinion, the methods like sum(f, x) and maximum(f, x) are unnecessary. They are using valuable methods just as syntactical sugar for map+sum and map+maximum, respectively. As I see it, the mapping operation is orthogonal, and need not be extra methods of all these higher order functions.

timholy commented 1 year ago

@jakobnissen, see https://github.com/JuliaLang/julia/issues/48502#issuecomment-1414227664

BioTurboNick commented 1 year ago

Worth linking the previous related discussions?

27613

39203

Seelengrab commented 1 year ago

Also https://github.com/JuliaLang/julia/pull/41339 with some summarized discussion of the previous discussions

timholy commented 1 year ago

Given that the point made by @oscardssmith and @jakobnissen is natural but misses the main point, maybe it's worth thinking about this way: how would you express the 1-arg method in terms of the 2-arg method? The answer is

argmax(itr) = argmax(idx -> itr[idx], CartesianIndices(itr))

Note that the itr moves into the mapping function and only barely contributes to the collection being iterated over. To me this is a clear sign that these are actually two different functions.

BioTurboNick commented 1 year ago

Given that the point made by @oscardssmith and @jakobnissen is natural but misses the main point, maybe it's worth thinking about this way: how would you express the 1-arg method in terms of the 2-arg method? The answer is

argmax(itr) = argmax(idx -> itr[idx], CartesianIndices(itr))

Note that the itr moves into the mapping function and only barely contributes to the collection being iterated over. To me this is a clear sign that these are actually two different functions.

I suggested something similar, though slightly less general, in one of those posts.

aplavin commented 1 year ago

That's totally a footgun! I also tried to convince others not to introduce argmax(f, A) but failed. See a long discussion at https://discourse.julialang.org/t/findmax-and-friends-confusing-behaviour-to-be-introduced-in-1-7/61904. There's direct empirical evidence that it's a footgun: even popular packages sometimes implement argmax wrong, even after almost two years of argmax(f, A) - see the end of that discourse thread.

If starting from scratch: an intuitive, unambiguous, and consistent interface could be indmax([f], X)/keymax to return the index of the maximum, and maximum(A, [by=f]) to return the maximizing element. Leaving argmax to specialized optimization/minimization packages.

JeffBezanson commented 1 year ago

This has been discussed many times. I would summarize as

timholy commented 1 year ago

Agreed that argmax(f, s) is the better definition, even though it was added later.

MasonProtter commented 1 year ago

I'm really surprised to see people complain about sum(f, itr). I think that functions like that are fantastic because new users typically don't understand what the hell map, reduce, foldl and filter are and it takes a surprisingly long time to wrap ones head around them. Supplying sum(f, itr) is a really nice stepping stone IMO.

But even as an experienced user I find those functions really convenient and natural.

jariji commented 1 year ago

@MasonProtter If Julia had concise reduction syntax, it could be easier to use even than sum(f, itr). But I don't want to derail the thread - happy to continue elsewhere.