DOV-Vlaanderen / groundwater-logger-validation

Analysis on validation methods for groundwater logger data
MIT License
2 stars 2 forks source link

Epsilon estimation for intervals not divisible by 5 minutes #47

Open fredericpiesschaert opened 4 years ago

fredericpiesschaert commented 4 years ago

KBRP202A is an example of a timeseries with tidal influence (time interval 1 hour). The package detects many levelshifts that aren't really levelshifts. Raw data can be found here

DavorJ commented 4 years ago

After a quick check I get this:

image

Only two outliers (gwloggeR v0.1.5, algorithm v0.06). Is it different in WATINA?

fredericpiesschaert commented 4 years ago

this is how it looks in watina at the moment: 211 levelshifts detected image

fredericpiesschaert commented 4 years ago

So it seems to be a bad implementation in Watina, is that correct?

DavorJ commented 4 years ago

It looks like there is a bug somewhere, but don't know where.

df <- gwloggeR.data::read('KBRP202A')$df
data.table::setkey(df, TIMESTAMP_UTC) # sort df by timestamp

ls <- gwloggeR::detect_levelshifts(x = df$PRESSURE_VALUE,
                                   apriori = gwloggeR::apriori('hydrostatic pressure'),
                                   plot = TRUE, timestamps = df$TIMESTAMP_UTC)

sum(ls) # count level shifts

Result last statement:

[1] 0

So should be 0 levelshifts.

I don't see what the problem could be or what might be causing it. I have a feeling it might have to do with sorting by timestamp prior to feeding it to gwloggeR. You checked with @Jo-Loos?

PS, just an idea: it might be interesting to place a link in WATINA to gwloggeR issues tracker here, so if someone sees weird stuff like the above, he/she can ask about it here.

fredericpiesschaert commented 4 years ago

@Jo-Loos ik zie dat in de inputquery van de stored procedure de metingen met status ongeldig en verwijderd (INV-DEL) niet weggefilterd worden. Dit is een serie met heel veel dubbele metingen waarvan de dubbels op inv/del staan, zou dat de oorzaak kunnen zijn? SELECT --pp.PPNT_ID --, pp.PPNT_CDE --, ds.DRSO_ID --, ds.DRSO_SER_NBR --, dm.DRME_ID , CONVERT( dateTime, COALESCE (dm.DRME_OCR_UTC_DTE, dm.DRME_OCR_DTE)) as DRME_OCR_UTC_DTE , dm.DRME_DRU --, dm.DRME_TPU --, dm.DRME_DMTP_CDE FROM dbo.tblDrukmeting dm INNER JOIN dbo.relPeilpuntDruksonde pds on pds.PPDS_DRSO_ID = dm.DRME_DRSO_ID AND pds.PPDS_ID = dm.DRME_PPDS_ID INNER JOIN dbo.tblPeilpunt pp ON pp.PPNT_ID = pds.PPDS_PPNT_ID INNER JOIN dbo.tblDruksonde ds ON ds.DRSO_ID = pds.PPDS_DRSO_ID WHERE pds.PPDS_ID = 68805 AND dm.DRME_DRU IS NOT NULL AND COALESCE(dm.DRME_OCR_UTC_DTE, dm.DRME_OCR_DTE) IS NOT NULL ORDER BY CONVERT( dateTime, COALESCE (dm.DRME_OCR_UTC_DTE, dm.DRME_OCR_DTE)), dm.DRME_ID

fredericpiesschaert commented 4 years ago

@DavorJ ik zal je de reeks bezorgen inclusief status inv/del, dan kunnen we eens zien als de levelshifts er wel uitkomen

fredericpiesschaert commented 4 years ago

full time series is available

DavorJ commented 4 years ago

Nu zijn er idd. veel levelshifts:

image

DavorJ commented 4 years ago

A remark here: If time-intervals are less than one hour, such as in this case, and not divisible by 5 min, then only a very small sample set of 1 min interval apriori-data is used for epsilon estimation in the random walk hydrostatic pressure model. That explains the many level shifts. The data with removed INV/DEL samples is 1h interval and divisible by 5 min, so a much larger sample of geotech data is used for epsilon estimation. I'll add this it as a "future work" option.