Drifts 28/29: fixed AR(1) model for drift detection

DavorJ commented 3 years ago

This is in continuation of #64 where it was suggested to use a fixed AR(1) model for drift detection.

Top left is best ARMA model, identical to top left from #64.
Top right are the differences between consecutive timestamps. The Y-axis is in hours on log-scale. It shows that in some cases, the variance is significantly different once the sampling rate of the barometer changes. E.g.:
The middle left is the AR(1) = 0.9 model and middle right is AR(1) = 0.9 with seasonality component.
The bottom left is the AR(1) = 0.6 model and bottom right is AR(1) = 0.6 with seasonality component.

When we look at what the best ARIMA model is for air pressure, then we end up with AR(1) ~ 0.85 for 12h intervals, which means that the next point is explained by 85 % of where the previous point stands (relative to the mean). This percentage depends on the interval: the smaller the interval (e.g. a few of seconds), the closer it comes to 100 % (= random walk), and the larger the interval (e.g. a few of days) the less dependence there is. In current models this percentage is fixed (i.e. does not vary with time intervals between measurements) to 0.9 and 0.6.

When we take the difference between one barometer and a reference barometer, then this AR(1) component should remain the same (theory). Unfortunately, theory and practice do not always coincide. In practice, AR(1) is not 0.9 for all barometers, which, once the differences are taken, results in a more complex ARMA model (theory). That is why AR(1) = 0.6 model is also taken for comparison: data suggests that this AR(1) component is reduced.

So the question is, which one is best?

All models pinpoint the best location at which the drift starts (vertical blue dotted line).
All models will also give a good approximation of the drift speed, e.g. cmH2O/day
The main difference between these models is how much they are certain that the estimated drift is not just due to bad luck, but is real. Due to strong dependence between measurements, a good estimation is difficult (if not impossible).
- The simple linear model from #63 assumes independence between measurements, and thus has very bad significance estimations: it would practically identify a few consecutive high pressure readings as drifts, simply because they do not resemble white noise.
- The best ARMA and AR(1) models from #64 allow for dependence between points up to a certain extent. But there will always remain some correlation between points.
- This analysis differs from #64 in that we set the dependence parameter to a fixed value (e.g; AR(1) = 0.9). Why? Because there is no reason why it should be different between barometers. Estimating this first on all data -- instead on individual cases -- should improve the model and drift detection.

Here is a full comparison (part1 and part2 -- split due to size) of #63 (top left chunk), #64 (top right chunk) and this analysis (bottom left chink).

My current preference is AR(1) ~ 0.9 model (= bottom left chunk, the two middle plots): it seems the most conservative. What do you think @fredericpiesschaert? If there are no comments or new ideas, then I am planning to make a first implementation + diagnostic plots of _detectdrift().

fredericpiesschaert commented 3 years ago

@DavorJ I will look at this on wednesday. @mathiaswackenier can you also evaluate this?

DavorJ commented 3 years ago

I am also thinking of working with colors based on significance (i.e. how certain the model is that the barometer is drifting.) In the above case for BAOL008X, you see that AR(1) ~ 0.9 model didn't pass the significance test of 1/10000 for a drift (= there is no vertical blue line), but the model is still quite sure that the series is drifting. (De drift is also relatively small: about 1 cmH2O over 5 years.)

fredericpiesschaert commented 3 years ago

go for it as far as I'm concerned! The graph with differences between consecutive timestamps is very helpful, so I would certainly include it in the diagnostic plots.

DOV-Vlaanderen / groundwater-logger-validation

Drifts 28/29: fixed AR(1) model for drift detection #65