DOV-Vlaanderen / groundwater-logger-validation

Analysis on validation methods for groundwater logger data
MIT License
2 stars 2 forks source link

Drifts 23: simple linear model for drift detection #63

Open DavorJ opened 3 years ago

DavorJ commented 3 years ago

The simplest model for drift detection that I can make is the following (e.g. BAOL031X_A7250):

image

Let us decompose it in the following graphs:

image

There are a couple of limitations with this "simple" approach.

Can these issues be taken into consideration with a more complex model? Yes, but at this point I am not sure In which way to continue in terms of "time invested vs. value generated". There are a couple of modeling possibilities....

But the biggest problem is determining when the drift is significant. The blue vertical dotted line is only drawn in case of significance. (Currently very quick and dirty, but it at least gives an idea.) And as you can see in this overview, many of the barometers have some drift according to this model.

I wonder what you think @fredericpiesschaert, @mathiaswackenier and Piet from a user/business perspective? Any value in this?

fredericpiesschaert commented 3 years ago

Taking into account seasonal variance, it doesn't seem very useful to determine drift when the timeseries is less then one or even two years. That would eliminate these 'drift cases' image

And, as I've said before, timeseries should be validated before presenting them to the drift function. We have to get rid of outliers and other anomalies, they really blur the picture: image

I have to take a closer look at the examples, but it looks promising to me.

DavorJ commented 3 years ago

@fredericpiesschaert, concentrating on only series of more than 2 years is an option, but seems arbitrary to me. See #64: seems to work much better for short series due to a more complex model.

And yes, only validated data should be supplied to the function. An other option would be to use the _detectoutliers() filter before, but that one would potentially remove drift-information, so isn't an option.

fredericpiesschaert commented 3 years ago

@DavorJ 2 years is arbitrary indeed, I only meant that there are less drift cases than the model suggests and that it takes some common sense of the user to evaluate the model suggestions.

fredericpiesschaert commented 3 years ago

I find these graphs very interesting. Take a look at this one. There is no way you would suspect drift when looking at the original series, yet there seems to be something going on from the beginning. Does that mean you have to throw away the entire series. I would think not, but from what point on does drift become a problem? Not an easy one. It probably will be a user-decision? image

image