JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.46k forks source link

Changing `argmax` and `argmin` behavior to match the mathematical definition #27639

Closed Nosferican closed 6 years ago

Nosferican commented 6 years ago

The definition of argmax from Wikipedia, Given a collection of elements (domain), which map to an ordered set (range), what is the subset containing those elements in the domain whose image is the frontier.

Currently, argmax is actually indmax which is defined for indexed collections and returns the index of the first encountered element whose image is equal to the frontier. The generalization of keymax (discussed, but not implemented) extends the supported collections from indexed to pairs-implemented collections. This is the result of renaming indmax to argmax. A current proposal is for argmax to be defined for Generator (#27613),

argmax(f(x) for x ∈ domain)

with that change, we could fix the implementation with something like

import Base: Generator, argmax
function argmax(f::Function, domain::Generator)
    T = eltype(domain)
    tested = Set{T}()
    output = Set{T}()
    frontier = nothing
    for elem ∈ domain
        if !(elem ∈ tested)
            push!(tested, elem)
            value = f(elem)
            if (frontier === nothing) || (value > frontier)
                frontier = value
                output = Set(elem)
            elseif value == frontier
                push!(output, elem)
            end
        end
    end
    return output
end

and call it with the following syntax

argmax(abs2, elem for elem ∈ -3:3) == Set([-3, 3])
JeffBezanson commented 6 years ago

That definition and implementation don't depend on Generator; any iterable would work.

I agree this is useful but I don't think argmax should return a Set by default; it's quite inefficient in the common case where the answer is e.g. a single integer. This could be a new function like argmaxall or argmaxes.

Nosferican commented 6 years ago

The answer would have form of some collection of eltype(domain). Maybe if the length of the container is one, it could return the value. I used Set in the implementation to guarantee uniqueness (no need to compute the value for an element which we already considered, but overkill since I am checking for it in the loop). Other ways could call unique on the domain as first step (but why do an extra pass?). There shouldn't be any efficiency loses from returning the full set rather than the first since the implementation only passes through each once (no ability to break early since it doesn't do two passes). output could be set to Vector{eltype(domain)} if that is more convenient (bit extra information of order which shouldn't matter, but why not). Since output is guarantee to be non-empty, people could call first on it if it is a <:AbstractVector. Any iterable should work.

Nosferican commented 6 years ago

We should probably resurrect indmax and implement

argmax(f::Function, itr::Any, unique::Bool=false)

separately.

nalimilan commented 6 years ago

See also https://github.com/JuliaLang/julia/pull/25654 and https://github.com/JuliaLang/julia/issues/24865.

StefanKarpinski commented 6 years ago

What is the unique flag meant to do?

Nosferican commented 6 years ago

The mathematical definition defines the solution as a set, but for computational reasons it might be best to provide a Vector{eltype(domain)}. In certain applications, the solution is guaranteed to be a single element which would return a Vector{eltype(domain)} of length one. For convenience we could offer the unique flag to perform a first(output) to return a eltype(domain). I am not married to it, as users could just call first on it directly (also an advantage of using <:AbstractVector over <:Dict).

@nalimilan, thanks for linking those issues (I had hunting for those in my task list for today). I would like to know your thoughts on the issue. I concur indmax was not the best name and changing it would be beneficial, but I believe argmax was not the right name as it represents a different concept/operation. My ideal solution would be to offer the current behavior of argmax renamed with something better and use argmax for the mathematical concept described here (that would also be a great to expedite and simplify the work on a PR to implement it).

nalimilan commented 6 years ago

Well, if we wanted to be fully consistent argmax would have to be renamed to findfirstmax/findmaxfirst, and then the function returning the indices of all values giving the maximum would be findallmax/findmaxall. In practice these names aren't too appealing and argmax is a well-known concept, so I'd rather keep the current behavior of argmax, but (as @JeffBezanson said) provide another function or an argument to get a vector of indices rather than just the first index.

Something I would really have liked to change is renaming findmax to something else for consistency with other find* functions (https://github.com/JuliaLang/julia/issues/24865).

Nosferican commented 6 years ago

On the decision of returning the indices or subset of domain, R uses which.min and which.max with a keyword for first or every index. Python uses values.index(max(values)) similar to findall(isequal(maximum(obj)), obj) Matlab uses max with the 2-nargout option (similar to last(findmax(obj))

The only implementation of this functionality using argmax as the name is numpy.

My number one concern is to avoid having a misnamed concept similarly to the numpy case. I believe it would be nice to have an efficient and convenient function for the argmax concept and one for the indices (especially one which doesn't require materializing the whole image), but not to have the one returning the indices named argmax.

StefanKarpinski commented 6 years ago

The mathematical meaning of argmax(f, itr) is clear: find an element of itr which maximizes f. Having argmax(v) return indices seems wrong if you think of f = identity and v = itr. However, it makes perfect sense if you think of f as v—or more precisely f = i -> v[i] and itr = keys(v). As long as we explain that, it does not seem like there's anything wrong with using the name argmax for that quite useful operation. Moreover, the alternative possible meaning of argmax(v) as argmax(identity, v) is silly because maximum(v) already does that.

The only question in my mind is whether it would be better to leave argmax(v) undefined and have a separate indmax(f, v) function which returns an index in v at which the function f is maximized and then have indmax(v) = indmax(identity, v) which is what our argmax currently does (not coincidentally, it was previously called indmax). But I can't see much need for indmax(f, v) since we can just write argmax(i->f(v[i]), keys(v)) or if some version of https://github.com/JuliaLang/julia/pull/24990 gets merged, argmax(f∘v[_], keys(v)) which doesn't seem so bad in cases that you want indices instead of values.

Bottom line: if you're interested in indices, you can always use the indices as your collection and make indexing part of the function you're maximizing. You can kind of go in the other direction since you can always use an index to extract an element back out of an indexed collection, but if we're choosing between indmax and argmax, argmax wins since it makes sense on any iterable, not just indexable collections. So having argmax without indmax is fine but having indmax without argmax is not—you still have no way of doing what argmax(f, itr) does in general. Of course, we don't currently have that method, but we should.

Nosferican commented 6 years ago

Good points. The edge cases seem to suggest the best design would be a version of argmax(f::Function, itr::Any; first::Bool = false) as it seems the most versatile and useful.

StefanKarpinski commented 6 years ago

Is that first keyword intended to control whether to get only the first maximizing value or all of the? If so, I'd make a few comments: I do not think that returning all maximizing elements is more standard or more useful so it should not be the default. Sure, the formal definition of argmax is that it returns a set but if you look at typical usages in math writing this is usually glossed over and it is assumed that argmax gives you a single maximizing value.

Nosferican commented 6 years ago

I am perfectly happy with having that option default to a single value (i.e., first::Bool = true), especially if argmax will be used for querying indices/keys. Another option would be to default to return a Vector{eltype(domain)} and then people can call first(argmax(f, domain)) as it will always be a non-empty <:AbstractVector. Either solution seems good.

StefanKarpinski commented 6 years ago

Either solution seems good.

Doing first(argmax(f, domain)) is wildly inefficient.

Nosferican commented 6 years ago

Using the implementation at the start of this of this post it wouldn't be costly at all. However, I now think having the keyword argument would be best since the implementation could be adapted to only update upon finding a new frontier as opposed to expanding the solution set when new images are at the frontier (i.e., the efficiency gains would come from knowing the desired output at the start). With a bit of guidance I could prep a PR for these changes after deciding on the specifics.

StefanKarpinski commented 6 years ago

Using the implementation at the start of this of this post it wouldn't be costly at all.

I think we have different notions of what "costly" means. Your implementation allocates two Set objects (why two? only one seems necessary). The scalar version of argmax(f, itr) shouldn't allocate anything at all.

StefanKarpinski commented 6 years ago

Here's a sketch implementation of the two-arg argmax(f, itr):

function argmax(f, itr)
    r = iterate(itr)
    r === nothing && error("empty collection")
    m, state = r
    f_m = f(m)
    while true
        r = iterate(itr, state)
        r === nothing && break
        x, state = r
        f_x = f(x)
        isless(f_m, f_x) || continue
        m, f_m = x, f_x
    end
    return m
end

The iterable itr is iterated only once and f is applied to the yielded values exactly once per item. It allocates nothing unless f or the iteration of itr does.

Nosferican commented 6 years ago

The reason for the two collections are different: (1) keeps track of whether the image is known (it skips the computation of f on elements it has tested which is best suited for costly mapping functions or a high number of duplicated elements in the domain) and (2) it would only allocate it when doing the full subset, for the unique solution one it wouldn't allocate a collection and just keep track of the frontier and best guess. I originally used Set, but since I keep track of duplicated it can just be a Vector.

StefanKarpinski commented 6 years ago

Set versus Vector doesn't much matter, they're both big, expensive data structures.

Nosferican commented 6 years ago

We could just ask the users to pass argmax(f, unique(domain)), if it is a costly mapping or many duplicates, no?

StefanKarpinski commented 6 years ago

unique is an expensive allocating operation itself, so that doesn't help. Perhaps you're missing something about what I'm saying: I'm not worried about the number of elements in in the collection that's returned, my problem is with returning a collection at all. Especially given that the scalar version of this that people generally actually want can be implemented (as above), in a completely efficient generic way that does no allocation whatsoever. For example, if I have a definition like this:

g(a, b) = argmax(x -> -abs(x-3), min(a,b):max(a,b))

then the native code for a call like g(-10, 10) is really efficient:

    cmpq    %rdi, %rsi
    movq    %rdi, %r8
    cmovleq %rsi, %r8
    cmovlq  %rdi, %rsi
    cmpq    %rsi, %r8
    jne L23
    movq    %r8, %rax
    retq
L23:
    leaq    -3(%r8), %rax
    movl    $3, %ecx
    subq    %r8, %rcx
    testq   %rax, %rax
    cmovnsq %rax, %rcx
    negq    %rcx
    movq    %r8, %rax
L48:
    movl    $2, %edi
    subq    %r8, %rdi
    leaq    -2(%r8), %r9
    leaq    1(%r8), %rdx
    testq   %r9, %r9
    cmovnsq %r9, %rdi
    negq    %rdi
    cmpq    %rdi, %rcx
    cmovlq  %rdx, %rax
    cmovlq  %rdi, %rcx
    movq    %rdx, %r8
    cmpq    %rsi, %rdx
    jne L48
    retq

It's all just simple integer instructions, comparisons and jumps. If there was any allocation whatsoever in the definition of argmax then this would be totally different and not so incredibly simple and fast.

StefanKarpinski commented 6 years ago

I think we can close this after discussion at https://github.com/JuliaLang/julia/issues/27612#issuecomment-399215782 which leaves us in the situation that the only actionable change is adding the two-argument argmax method, for which there's already a specific issue open: https://github.com/JuliaLang/julia/issues/27613.

nalimilan commented 6 years ago

The mathematical meaning of argmax(f, itr) is clear: find an element of itr which maximizes f. Having argmax(v) return indices seems wrong if you think of f = identity and v = itr. However, it makes perfect sense if you think of f as v—or more precisely f = i -> v[i] and itr = keys(v). As long as we explain that, it does not seem like there's anything wrong with using the name argmax for that quite useful operation. Moreover, the alternative possible meaning of argmax(v) as argmax(identity, v) is silly because maximum(v) already does that.

Technically there's no ambiguity, but am I the only one to find it weird that argmax(identity, x) would be completely different from argmax(x)? None of our reduction functions does that AFAICT.

I'm starting to regret having renamed indmax to argmax now that we want to add that two-argument method with an inconsistent meaning...

StefanKarpinski commented 6 years ago

It's a little unexpected but having argmax(v) mean argmax(i -> v[i], keys(v)) seems to be intuitive to many people as well. As I argued above, indmax isn't really necessary as a function at all.

martinholters commented 6 years ago

I, too, find it weird. Also, there is no f such that argmax(f, x) == argmax(x), i.e. f cannot be seen as an optional argument in non-final position here. Doesn't mean it's no-go, but it's unusual and probably would give me a pause if I came across it without being aware of the background in this issue and related PRs.

davidanthoff commented 6 years ago

I concur with @nalimilan here. I find argmax(v) as a shortcut for argmax(i -> v[i], keys(v)) really weird, it seems a pattern that I haven't seen anywhere else in base. On the flipside, the pattern foo(v)==foo(identity, v) seems really common, so that is certainly what I would have expected.

I think fundamentally things would be easier if argmax always returned an element out of the original source, and never an index. Getting and index seems kind of surprising to me, and as @nalimilan pointed out also not something often seen in other languages.

A separate issue seems to be whether it should return a single value, a list or a set. On that I concur with @StefanKarpinski: a single element seems the most common and useful case.