Open dsyme opened 5 years ago
If I could give this a hundred upvotes I would. So many of the optimization papers I read have this as a fundamental concept.
I would love to have both indexOfMax/indexOfMin and indexOfMaxBy/indexOfMinBy. In Optimization and Data Analysis the need for these types of functions comes up frequently.
I would love to work with someone on implementing this. Since I have not contributed to the F# language in the past I would need some help and guidance.
@matthewcrews since @dsyme created the issue I believe he intends on implementing the functions, but I think it would be illustrative to give some examples of how you'd use these functions in the context of your work.
It would also be helpful if there are other things you find missing from F#, or at least feel like something that ought to be a part of FSharp.Core list/array/seq combinators when working in optimization and data analysis. Separate suggestions can be created for things that make sense. From there, RFCs and/or trial implementations could advance if approved.
@cartermp Will do. Here are some common use cases for what I have to do. This is focused on working with numbers so I'm not taking being able to operate on the generic case. I'll leave that to the professionals :)
These all come from me starting to work on a Linear Programming solver. There are actually four different variations on this concept which come up often. Below I show the different cases and what my initial thoughts are on what it would be like to work with them.
let vector = [|1..10|]
// Case 1: I need the index of the Max or Min value
// a should be 0
let a = Array.indexOfMin vector
// b should be 9
let b = Array.indexOfMax vector
// Case 2: I need to perform a function on the elements and get
// the index where the result is the Max or Min
let analysisFunction x =
-1.0 * x
// a should be 9
let a = Array.indexOfMinBy analysisFunction vector
// b should be 0
let b = Array.indexOfMaxBy analysisFunction vector
// Case 3: I need to only consider a subset of the elements
// and return the Min or Max index where the condition holds
let filterFunction x =
x >= 5.0 && x <= 6.0
// a should be 4
let a = Array.indexOfMinWhere filterFunction vector
// b should be 5
let b = Array.indexOfMaxWhere filterFunction vector
// Case 4: I need to consider only a subset of the elements
// and then compute a value for each element which passes
// the condition. Of those, I want the index of the element
// which gave the Min or Max value for the given function.
let analysisFunction x =
-1.0 * x
let filterFunction x =
x >= 5.0 && x <= 6.0
// a should be 5
let a = Array.indexOfMinByWhere filterFunction analysisFunction vector
// b should be 4
let b = Array.indexOfMaxByWhere filterFunction analysisFunction vector
I could also make the case that an i
version of these functions would be helpful. These would be akin to the Array.mapi
or Array.iteri
functions. I completely understand if the sentiment is that it would crowd the module but I do have use cases where I need to know the index that I am currently iterating on.
When formulating problems or performing analysis over arrays I often have to index into another array. These are all things I'm going to have to write for myself if they are not in FSharp.Core. The examples here are silly but they show some of my common usage patterns.
// The i version of the Max and Min functions
let vector = [1..10]
let otherVector = [11..20]
let analysisFunction (index:int) x ->
otherVector.[index] + x
// Min and Max Byi functions
let a = indexOfMinByi analysisFunction vector
let b = indexOfMaxByi analysisFunction vector
// Min and Max By i Where functions
let a = indexOfMinByiWhere filterFunction analysisFunction vector
let b = indexOfMaxByiWhere filterFunction analysisFunction vector
I took a couple of minutes to write up what I think I would expect from the type signatures:
// Function signatures
// 'T [] -> int
Array.indexOfMin
// 'T [] -> int
Array.indexOfMax
// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinBy
// ('T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxBy
// ('T -> bool) -> 'T [] -> int
Array.indexOfMinWhere
// ('T -> bool) -> 'T [] -> int
Array.indexOfMaxWhere
// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWhere
// ('T1 -> 'T2) -> ('T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWhere
// The i version of the functions. Same as the ones above except the index
// of the current element is passed to the function as the first argument to the input functions
// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMinByi
// (int -> 'T1 -> 'T2) -> 'T1 [] -> int
Array.indexOfMaxByi
// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMinWherei
// (int -> 'T -> bool) -> 'T [] -> int
Array.indexOfMaxWherei
// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMinByWherei
// (int -> 'T1 -> 'T2) -> (int -> 'T1 -> bool) -> 'T1 [] -> int
Array.indexOfMaxByWherei
This is what I ended up doing for working with Arrays: https://github.com/matthewcrews/Scratchpad/blob/master/ArrayIndexing.fsx
Matthew please feel free to make a trial implementation.
However I think we would not include the indexed or indexOf..Where. The former because there are other functions that would get an indexed version before these would, and the latter because it is a design principle of fsharp core not to have filtering variations, eg mapWhere.
Sounds good. The reason for the indexOf..Where functions was to cover cases that the R data.table
library provides for that I have found useful in the past. They are advanced use cases but they have come up. I can just offer them in a separate library on Nuget.
Did we want this for anything besides Array
and List
? I could see having this for Map
as well since it is a data structure with index lookup.
@matthewcrews Yes, that makes sense.
We should do Seq also for consistency.
Although longer, findIndexOfMax[By] seems a better fit given we already have other findIndex methods (findIndex and findIndexBack)
I don't know if that would be a good idea. A Seq could be potentially infinite in length and therefore these functions may never return. That was why I was not planning on implementing them.
There are many Seq functions like that, including Seq.exists for example. So we should include Seq
I'm happy to add it for Seq as well. Right now my problem is that I can't find any guidance on getting testing setup and running for solution. I added functions for List and Array and unit tests for them but it says it fails because they aren't registered anywhere. Is there any documentation about how to properly add and run tests for this solution? I'd love to help by adding these functions but there's not a lot of guidance that I can find.
@dsyme
it is a design principle of fsharp core not to have filtering variations, eg mapWhere.
Isn't that Choose?
I think choose
is not exactly a filter
or a Where
. It's a kind of flatten for a Collection of options (or something that maps to option).
Having said that, I think the name should have been chooseBy
for consistency.
Perhaps there are different interpretations, but the way I read it is that it performs a map on a sequence to some sequence of options, filtering out the None
s and lifting the Some
s. It's a map and a filter at the same time.
It's incredibly useful and one of my favourite Seq
/ List
/ Array
functions, but I did always wonder why it wasn't just called filterMap
or something similar.
Per Don Syme, I will not be including the indexOf...Where
functions. I will be adding them for myself and hope to publish it as a separate Nuget library.
Conceptually you can think of the indexOf...Where
functions as a map
then choose
but that implementation would be terribly inefficient. My understanding was that we wanted a fast implementation of these functions which is why I use mutable variables to track the Min/Max as I iterate through the collections.
I need the indexOf...Where
functions for implementing an LP Solver which does a lot of index searching over arrays.
I think choose is not exactly a filter or a Where. It's a kind of flatten for a Collection of options (or something that maps to option).
The specific design decision I was referring to was not to have extra "where" versions that take a boolean-valued predicate. It's true that choose
sneaks past this by providing an option-valued projection. So you could imagine a
findIndexOfMaxBy : ('T -> 'U option) -> int
However that would still result in quite a proliferation of functions.
So choose
and pick
are basically the only places we do this.
You can also perform this operation with a single iteration of the source collection: Seq.indexed >> Seq.maxBy snd >> fst
. However for Arrays and other eager collection types this would (I imagine) allocate an intermediate collection (plus the extra tuple etc.).
ArgMax
and FindIndexOfMax
are good names.
Definitely good to have equivalence to the code in the OP using FindIndex
, in particular returning the first index giving the max and failing if the max is not found (empty list). This will need to be in the xml annotation.
@charlesroddie, like the IndexOf
methods for strings, I think it's reasonable to return -1
for empty lists, arrays, sequences.
Btw, there was a suggestion here to implement this for Map
, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.
That's OK. Whichever decision is taken, the naming should correspond to the corresponding existing find/index method so that the behavior is expected. If we go with your suggestion that would be IndexOfMax
.
@charlesroddie, like the
IndexOf
methods for strings, I think it's reasonable to return-1
for empty lists, arrays, sequences.Btw, there was a suggestion here to implement this for
Map
, but I don't think it makes sense, as maps are not ordered, so the returned index doesn't make sense.
Why not return the key in that case?
NumPy has
argmax
andargmin
to return the index of the maximum and minimum values in a collection.I propose we add something similar to F#, either with these names of
indexOfMax
andindexOfMin
The existing way of approaching this problem in F# is a little painful. You have to find the max and then find the specific element with that max.
Not that bad but seems reasonable to have this in the core library, optimized appropriately and inlined to avoid the generic comparisons.
The functions occurs in basic machine learning samples like https://www.tensorflow.org/tutorials/keras/basic_classification
Pros and Cons
The advantages of making this adjustment to F# are simpler code in some scenarios
The disadvantages of making this adjustment to F# are added functions in FSharp.Core.
Open questions
maxByIndex
by better thanindexOfMax
? Should it befindIndexOfMax
to fit withfindIndex
. Other suggestions?indexOfMaxBy
seems reasonable.Extra information
Estimated cost (XS, S, M, L, XL, XXL): S
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply: