fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Implement dense rank a la SQL or SAS #372

Open laygr opened 6 years ago

laygr commented 6 years ago

(From this stackoverflow question: https://stackoverflow.com/questions/46009911/how-to-perform-ranking-as-in-sas-or-a-dense-rank-as-in-sql-in-deedle )

This is my custom implementation, but I don't know if it is the most efficient implementation:

static member denseRank column (groups:int) rankName frame =
        let frameLength = Frame.countRows frame
        let chunkSize =  (float frameLength) / (float groups) |> Math.Ceiling

        let sorted =
            frame
            |> Frame.sortRows column

        let sortedKeys = Frame.getRowKeys sorted

        let ranksArr = Array.zeroCreate frameLength

        sortedKeys
        |> Seq.iteri (fun index _ -> ranksArr.[index] <- index / (int chunkSize))

        let ranks = Series(sortedKeys, ranksArr)
        let clone = frame.Clone()
        clone.AddColumn(rankName, ranks)
        clone
laygr commented 6 years ago

To test this, I would sort the frame by the column that a chose to create the rank from in ascending order and verify that the ranks are also increasing.

Example: Let this table be a frame: ktbgh

  1. (Using denseRank) We rank the table by age into 3 groups. We obtain this: 6vafq
  2. (Verification step) Then, by sorting the table by age in ascending order, we can see that the column "Age Rank" is also sorted in ascending order: 8tcqz