czbiohub-sf / xicor

xi correlation method adapted for python
MIT License
145 stars 17 forks source link

Anscombe's quartet data seems incorrect in conftest.py #20

Open BerryHoll opened 2 years ago

BerryHoll commented 2 years ago

I was looking at your (and the original R code) to see if I could port xicor to Java, and while testing I thought to re-use your nice examples using the Anscombe's quartet data in the conftest.py. However, it looks incorrect for the 3rd example when compared to your plots in anscombes_quartet_correlations.png and https://en.wikipedia.org/wiki/Anscombe%27s_quartet#Data. (I assume it was a manual 'testing' edit that accidentally made it in to the repository). I just thought to give you a head's up!

Also when comparing the xi-correlation in your anscombes_quartet_correlations.png against the R implementation values I find the same values except for the 4th dataset, i.e. 0.175 in R vs 0.1 in your plot, perhaps this has to do with the randomisation seed of resolving ties? Though in R the value seems not to change when I set different seed values:

calculateXI(c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5), c(8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68),simple=T)
[1] 0.275

calculateXI(c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5), c(9.14, 8.14, 8.74, 8.77, 9.26, 8.1, 6.13, 3.1, 9.13, 7.26, 4.74),simple=T)
[1] 0.6

calculateXI(c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5), c(7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73),simple=T)
[1] 0.725

calculateXI(c( 8, 8,  8, 8,  8,  8, 8,19,  8, 8, 8), c(6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.5, 5.56, 7.91, 6.89),simple=T)
[1] 0.175