fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
341 stars 20 forks source link

Add argmax,argmin equivalents to FSharp.Core #702

Open dsyme opened 5 years ago

dsyme commented 5 years ago

NumPy has argmax and argmin to return the index of the maximum and minimum values in a collection.

I propose we add something similar to F#, either with these names of indexOfMax and indexOfMin

The existing way of approaching this problem in F# is a little painful. You have to find the max and then find the specific element with that max.

module List = 
    let indexOfMax xs = 
        let mx = List.max xs
        xs |> List.findIndex (fun v -> v = mx)

Not that bad but seems reasonable to have this in the core library, optimized appropriately and inlined to avoid the generic comparisons.

The functions occurs in basic machine learning samples like https://www.tensorflow.org/tutorials/keras/basic_classification

Pros and Cons

The advantages of making this adjustment to F# are simpler code in some scenarios

The disadvantages of making this adjustment to F# are added functions in FSharp.Core.

Open questions

Extra information

Estimated cost (XS, S, M, L, XL, XXL): S

Affidavit (please submit!)

Please tick this by placing a cross in the box:

Please tick all that apply:

matthewcrews commented 5 years ago

If I could give this a hundred upvotes I would. So many of the optimization papers I read have this as a fundamental concept.

I would love to have both indexOfMax/indexOfMin and indexOfMaxBy/indexOfMinBy. In Optimization and Data Analysis the need for these types of functions comes up frequently.

I would love to work with someone on implementing this. Since I have not contributed to the F# language in the past I would need some help and guidance.

cartermp commented 5 years ago

@matthewcrews since @dsyme created the issue I believe he intends on implementing the functions, but I think it would be illustrative to give some examples of how you'd use these functions in the context of your work.

It would also be helpful if there are other things you find missing from F#, or at least feel like something that ought to be a part of FSharp.Core list/array/seq combinators when working in optimization and data analysis. Separate suggestions can be created for things that make sense. From there, RFCs and/or trial implementations could advance if approved.

matthewcrews commented 5 years ago

@cartermp Will do. Here are some common use cases for what I have to do. This is focused on working with numbers so I'm not taking being able to operate on the generic case. I'll leave that to the professionals :)

These all come from me starting to work on a Linear Programming solver. There are actually four different variations on this concept which come up often. Below I show the different cases and what my initial thoughts are on what it would be like to work with them.

let vector = [|1..10|]

// Case 1: I need the index of the Max or Min value

// a should be 0
let a = Array.indexOfMin vector

// b should be 9
let b = Array.indexOfMax vector

// Case 2: I need to perform a function on the elements and get
// the index where the result is the Max or Min

let analysisFunction x =
    -1.0 * x

// a should be 9
let a = Array.indexOfMinBy analysisFunction vector

// b should be 0
let b = Array.indexOfMaxBy analysisFunction vector

// Case 3: I need to only consider a subset of the elements
// and return the Min or Max index where the condition holds

let filterFunction x =
    x >= 5.0 && x <= 6.0

// a should be 4
let a = Array.indexOfMinWhere filterFunction vector

// b should be 5
let b = Array.indexOfMaxWhere filterFunction vector

// Case 4: I need to consider only a subset of the elements
// and then compute a value for each element which passes
// the condition. Of those, I want the index of the element
// which gave the Min or Max value for the given function.

let analysisFunction x =
    -1.0 * x

let filterFunction x =
    x >= 5.0 && x <= 6.0

// a should be 5
let a = Array.indexOfMinByWhere filterFunction analysisFunction vector

// b should be 4
let b = Array.indexOfMaxByWhere filterFunction analysisFunction vector
matthewcrews commented 5 years ago

I could also make the case that an i version of these functions would be helpful. These would be akin to the Array.mapi or Array.iteri functions. I completely understand if the sentiment is that it would crowd the module but I do have use cases where I need to know the index that I am currently iterating on.

When formulating problems or performing analysis over arrays I often have to index into another array. These are all things I'm going to have to write for myself if they are not in FSharp.Core. The examples here are silly but they show some of my common usage patterns.

// The i version of the Max and Min functions
let vector = [1..10]
let otherVector = [11..20]
let analysisFunction (index:int) x ->
    otherVector.[index] + x

// Min and Max Byi functions
let a = indexOfMinByi analysisFunction vector

let b = indexOfMaxByi analysisFunction vector

// Min and Max By i Where functions
let a = indexOfMinByiWhere filterFunction analysisFunction vector

let b = indexOfMaxByiWhere filterFunction analysisFunction vector
matthewcrews commented 5 years ago

I took a couple of minutes to write up what I think I would expect from the type signatures:

// Function signatures

// 'T [] -> int
Array.indexOfMin

// 'T [] -> int
Array.indexOfMax

// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinBy

// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxBy

// ('T -> bool) -> 'T [] -> int
Array.indexOfMinWhere

// ('T -> bool) -> 'T [] -> int
Array.indexOfMaxWhere

// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWhere

// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWhere

// The i version of the functions. Same as the ones above except the index 
// of the current element is passed to the function as the first argument to the input functions

// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinByi

// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxByi

// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMinWherei

// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMaxWherei

// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWherei

// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWherei
matthewcrews commented 5 years ago

This is what I ended up doing for working with Arrays: https://github.com/matthewcrews/Scratchpad/blob/master/ArrayIndexing.fsx

dsyme commented 5 years ago

Matthew please feel free to make a trial implementation.

However I think we would not include the indexed or indexOf..Where. The former because there are other functions that would get an indexed version before these would, and the latter because it is a design principle of fsharp core not to have filtering variations, eg mapWhere.

matthewcrews commented 5 years ago

Sounds good. The reason for the indexOf..Where functions was to cover cases that the R data.table library provides for that I have found useful in the past. They are advanced use cases but they have come up. I can just offer them in a separate library on Nuget.

matthewcrews commented 5 years ago

Did we want this for anything besides Array and List? I could see having this for Map as well since it is a data structure with index lookup.

dsyme commented 5 years ago

@matthewcrews Yes, that makes sense.

PatrickMcDonald commented 5 years ago

We should do Seq also for consistency.

Although longer, findIndexOfMax[By] seems a better fit given we already have other findIndex methods (findIndex and findIndexBack)

matthewcrews commented 5 years ago

I don't know if that would be a good idea. A Seq could be potentially infinite in length and therefore these functions may never return. That was why I was not planning on implementing them.

dsyme commented 5 years ago

There are many Seq functions like that, including Seq.exists for example. So we should include Seq

matthewcrews commented 5 years ago

I'm happy to add it for Seq as well. Right now my problem is that I can't find any guidance on getting testing setup and running for solution. I added functions for List and Array and unit tests for them but it says it fails because they aren't registered anywhere. Is there any documentation about how to properly add and run tests for this solution? I'd love to help by adding these functions but there's not a lot of guidance that I can find.

Richiban commented 5 years ago

@dsyme

it is a design principle of fsharp core not to have filtering variations, eg mapWhere.

Isn't that Choose?

gusty commented 5 years ago

I think choose is not exactly a filter or a Where. It's a kind of flatten for a Collection of options (or something that maps to option).

Having said that, I think the name should have been chooseBy for consistency.

Richiban commented 5 years ago

Perhaps there are different interpretations, but the way I read it is that it performs a map on a sequence to some sequence of options, filtering out the Nones and lifting the Somes. It's a map and a filter at the same time.

It's incredibly useful and one of my favourite Seq / List / Array functions, but I did always wonder why it wasn't just called filterMap or something similar.

matthewcrews commented 5 years ago

Per Don Syme, I will not be including the indexOf...Where functions. I will be adding them for myself and hope to publish it as a separate Nuget library.

Conceptually you can think of the indexOf...Where functions as a map then choose but that implementation would be terribly inefficient. My understanding was that we wanted a fast implementation of these functions which is why I use mutable variables to track the Min/Max as I iterate through the collections.

I need the indexOf...Where functions for implementing an LP Solver which does a lot of index searching over arrays.

dsyme commented 5 years ago

I think choose is not exactly a filter or a Where. It's a kind of flatten for a Collection of options (or something that maps to option).

The specific design decision I was referring to was not to have extra "where" versions that take a boolean-valued predicate. It's true that choose sneaks past this by providing an option-valued projection. So you could imagine a

findIndexOfMaxBy : ('T -> 'U option) -> int

However that would still result in quite a proliferation of functions.

So choose and pick are basically the only places we do this.

isaacabraham commented 5 years ago

You can also perform this operation with a single iteration of the source collection: Seq.indexed >> Seq.maxBy snd >> fst. However for Arrays and other eager collection types this would (I imagine) allocate an intermediate collection (plus the extra tuple etc.).

charlesroddie commented 5 years ago

ArgMax and FindIndexOfMax are good names.

Definitely good to have equivalence to the code in the OP using FindIndex, in particular returning the first index giving the max and failing if the max is not found (empty list). This will need to be in the xml annotation.

abelbraaksma commented 5 years ago

@charlesroddie, like the IndexOf methods for strings, I think it's reasonable to return -1 for empty lists, arrays, sequences.

Btw, there was a suggestion here to implement this for Map, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.

charlesroddie commented 5 years ago

That's OK. Whichever decision is taken, the naming should correspond to the corresponding existing find/index method so that the behavior is expected. If we go with your suggestion that would be IndexOfMax.

nasosev commented 4 years ago

@charlesroddie, like the IndexOf methods for strings, I think it's reasonable to return -1 for empty lists, arrays, sequences.

Btw, there was a suggestion here to implement this for Map, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.

Why not return the key in that case?