CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
MIT License
26.5k stars 7.84k forks source link

Autocorrelation shown for differing lags is not statistical correlation? #494

Open lemonad opened 4 years ago

lemonad commented 4 years ago

In section 3.2.1 on autocorrelation, it is stated that

one way of thinking about autocorrelation is "If I know the position of the series at time s, can it help me know where I am at time t?" In the series x_t, the answer is no.

Then an autocorrelation plot of x_t is given as evidence for this (top plot, below). However, by just adding the constant 1 to every x_t (making all x_t >= 0), we get a similar graph as for y_t (bottom plot, below) giving the impression that there is some correlation when there is none.

In this sense, I think using np.correlate for autocorrelation in this kind of statistical setting is potentially confusing (at least it was for me).

autocorrelation

Perhaps a better example is the plot of autocorr(np.ones(200)) which also shows diminishing correlation by increased lag (see below). Given how autocorrelation is described in this section, I think the reader would probably rather expect a high constant correlation in this case?

autocorrelation