DOV-Vlaanderen / groundwater-logger-validation

Analysis on validation methods for groundwater logger data
MIT License
2 stars 2 forks source link

Drifts 28/29: fixed AR(1) model for drift detection #65

Open DavorJ opened 3 years ago

DavorJ commented 3 years ago

This is in continuation of #64 where it was suggested to use a fixed AR(1) model for drift detection.

image

When we look at what the best ARIMA model is for air pressure, then we end up with AR(1) ~ 0.85 for 12h intervals, which means that the next point is explained by 85 % of where the previous point stands (relative to the mean). This percentage depends on the interval: the smaller the interval (e.g. a few of seconds), the closer it comes to 100 % (= random walk), and the larger the interval (e.g. a few of days) the less dependence there is. In current models this percentage is fixed (i.e. does not vary with time intervals between measurements) to 0.9 and 0.6.

When we take the difference between one barometer and a reference barometer, then this AR(1) component should remain the same (theory). Unfortunately, theory and practice do not always coincide. In practice, AR(1) is not 0.9 for all barometers, which, once the differences are taken, results in a more complex ARMA model (theory). That is why AR(1) = 0.6 model is also taken for comparison: data suggests that this AR(1) component is reduced.

So the question is, which one is best?

Here is a full comparison (part1 and part2 -- split due to size) of #63 (top left chunk), #64 (top right chunk) and this analysis (bottom left chink).

image

My current preference is AR(1) ~ 0.9 model (= bottom left chunk, the two middle plots): it seems the most conservative. What do you think @fredericpiesschaert? If there are no comments or new ideas, then I am planning to make a first implementation + diagnostic plots of _detectdrift().

fredericpiesschaert commented 3 years ago

@DavorJ I will look at this on wednesday. @mathiaswackenier can you also evaluate this?

DavorJ commented 3 years ago

I am also thinking of working with colors based on significance (i.e. how certain the model is that the barometer is drifting.) In the above case for BAOL008X, you see that AR(1) ~ 0.9 model didn't pass the significance test of 1/10000 for a drift (= there is no vertical blue line), but the model is still quite sure that the series is drifting. (De drift is also relatively small: about 1 cmH2O over 5 years.)

fredericpiesschaert commented 3 years ago

go for it as far as I'm concerned! The graph with differences between consecutive timestamps is very helpful, so I would certainly include it in the diagnostic plots.