MastodonC / kixi.stats

A library of statistical distribution sampling and transducing functions
https://cljdoc.xyz/d/kixi/stats
360 stars 18 forks source link

Covariance calculation #3

Closed seb231 closed 7 years ago

seb231 commented 7 years ago

Hi Henry

I've been attempting to use the covariance function here and it's producing unexpected values?

If the algebra here is correct: http://www.statisticshowto.com/covariance/

Then the covariance of this dataset: [{:x 1 :y 1000} {:x 3 :y 1} {:x 5 :y 2}] should be ~-998, but this covariance function produces ~-665.

I worked it out long hand in clojure, does this look right to you?

(/ (+ (* (- 1 3) (- 1000 334.3333)) (* (- 3 3) (- 1 334.3333)) (* (- 5 3) (- 2 334.33333))) (- 3 1))

I think the function is missing the (- n 1) on the end, so this change to line 168:

(when-not (zero? c) (/ ss (- c 1)))

henrygarner commented 7 years ago

Hi Seb,

The covariance function is implementing population covariance rather than sample covariance. The difference is the (- n 1) as you identified.

I doubt you'll be the only person confused by this; other functions in core use an -s or -p suffix to distinguish between sample and population variants, and default to the sample variant if used without the suffix.

I've pushed a new version 0.3.0 to bring covariance into line. Thanks for raising the issue!