fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
206 stars 54 forks source link

Seq.spearman uses non-standard method for deciding ties #142

Closed nhirschey closed 2 years ago

nhirschey commented 2 years ago

Describe the bug Should Seq.spearman x y return the same result as in R and Scipy?

I wanted to add tests for spearman correlation. I noticed the results were different from R. I discovered it is due to a difference in how ties are decided. R and python's Scipy use average ranks in case of ties. FSharp.Stats currently uses FSharp.Stats.Rank.rankFirst. If I change to use FSharp.Stats.Rank.rankAverage then I get the same result as R and Scipy.

Can I submit a pull request to make Seq.spearman behave the same way as in R and Scipy? I would also add tests and make it take sequences as inputs (it's currently constrained to arrays as inputs).

Example FSharp.Stats

let x, y = [|5.05;6.75;3.21;2.66|], [|1.65;2.64;2.64;6.95|]
Seq.spearman x y
// val it : float = -0.8

> let spearman array1 array2 =
-
-     let spearRank1 = FSharp.Stats.Rank.rankAverage array1
-     let spearRank2 = FSharp.Stats.Rank.rankAverage array2
-
-     pearson spearRank1 spearRank2
-
> spearman x y    ;;
val spearman : array1:float array -> array2:float array -> float
val it : float = -0.632455532

Example R

> x = c(5.05,6.75,3.21,2.66); y= c(1.65,2.64,2.64,6.95);cor(x,y,method = "spearman")
[1] -0.6324555

Example Scipy

(x, y) = ([5.05,6.75,3.21,2.66], [1.65,2.64,2.64,6.95])
spearmanr(x,y)[0]
# -0.6324555
bvenn commented 2 years ago

Thanks for reporting. I checked this issue and you're right we should change it.

TREATMENT OF TIES
Items that are tied are each allocated the average of the ranks that
they would have had had they been distinguishable.

from: Williams R.B.G. (1986) Spearman’s and Kendall’s Coefficients of Rank Correlation. 
Intermediate Statistics for Geographers and Earth Scientists, p453, https://doi.org/10.1007/978-1-349-06813-5_6

Feel free to file a PR fixing this issue together with the modifications you suggested 🚀

bvenn commented 2 years ago

closed by b91c80d