Closed bvenn closed 2 years ago
abs (a-b) <= 0.
Since an absolute cannot be smaller than 0 it can be changed to abs (a-b) = 0
https://github.com/fslaborg/FSharp.Stats/blob/46ab30aa63dd353ce1700de5d6b14f2115f29ca3/src/FSharp.Stats/Rank.fs#L23Apparently, restarting the computer fixed issue 2. Unit tests are added and changes are on their way.
At this moment the ranking order is as follows:
nans and infinities are treated as individual elements:
let example = [|-infinity;1;nan;infinity;infinity|]
rankFirst = [| 2; 3; 1; 4; 5 |]
rankMin = [| 2; 3; 1; 4; 5 |]
rankMax = [| 2; 3; 1; 4; 5 |]
rankAvg = [| 2; 3; 1; 4; 5 |]
pandas has the following options for na_option
:
R! has the following options for na.last
:
I would recommend to assign nan ranks to nan values as default case.
let example = [|-infinity;1;nan;infinity;infinity|]
rankFirst = [| 1; 2;nan; 3; 4 |]
What do you think? @muehlhaus @kMutagene @ZimmerD
Yes, I think your suggestion is very good.
I've just created a update-rank branch to solve all issues. By default nan is sorted to the start of a sequence. This corrupts the loop of the implemented version. There are two possibilities to solve it:
Change the comparer to sort nan to the end. Afterwards the loop can be modified, that
compNaNLast
has to perform two nan checks the performance is reduced 20fold. Therefore I added specialized functions rankFirst
-> rankFirstNaNLast
Leave the sorting as it is and add a counter to the loop that counts whenever a nan occurs. Set its rank to nan and subtract the counter value from the rank of all following real values. Thereby the nan checks are reduced by half. Because these checks are not necessary for all other types than nan, a type query in the beginning could help by reducing the number of nan checks.
There are 4 ranking methods in
FSharp.Stats.Rank
:example = [5,3,3,4,2]
result :int [] = [5,2,3,4,1]
result :float [] = [5,2,2,4,1]
result :float [] = [5,3,3,4,1]
result :float [] = [5,2.5,2.5,4,1]
By now, all functions except rankFirst result in float arrays. The only function where floats can occur is rankAvg. For harmonization I would suggest, that rankFirst as well should report a float array, although it would be a breaking change.
There seems to be an issue, that ties are not ranked correct here: https://github.com/fslaborg/FSharp.Stats/blob/46ab30aa63dd353ce1700de5d6b14f2115f29ca3/src/FSharp.Stats/Rank.fs#L65 It seems the +1 increment belongs to rankMax rather than rankMin. EDIT 01/02/22: THIS WAS A TEMPORAL LOCAL ERROR!
A fix is on the way!