DOV-Vlaanderen / groundwater-logger-validation

Analysis on validation methods for groundwater logger data
MIT License
2 stars 2 forks source link

gwloggeR v0.1.x #9

Closed DavorJ closed 5 years ago

DavorJ commented 5 years ago

Ok, the first gwloggeR package, as simple as it can get. For installing it:

devtools::install_github("DOV-Vlaanderen/groundwater-logger-validation", subdir = "gwloggeR", ref = "0.1.0")

It should install without RTools: there is nothing to compile.

Although subdir is specified, whole git repository will be downloaded anyway. I don't see a quick way to circumvent this. Although now this is not a problem (git repo is 40MB), it probably will become a problem in the future.

I'll keep this issue open for comments, until some new minor version is released.

A quick example to get one started.

library(gwloggeR)
?detect_outliers
?apriori

x <- c(1000:1010, 500)
detect_outliers(x, apriori = apriori(data_type = "air pressure", units = "cmH2O"))

cc @fredericpiesschaert

DavorJ commented 5 years ago

UPDATE: v0.1.1

devtools::install_github("DOV-Vlaanderen/groundwater-logger-validation", subdir = "gwloggeR", ref = "0.1.1")

Added support for "diver" data. Lower bound is determined by "air pressure" a-priori information, and upper bound is determined based on v0.01.

DavorJ commented 5 years ago

UPDATE: v0.1.2

devtools::install_github("DOV-Vlaanderen/groundwater-logger-validation", subdir = "gwloggeR", ref = "0.1.2")

"diver" changed to "hydrostatic pressure", and a convenience function added for detecting duplicate timestamps: detect_duplicates().

DavorJ commented 5 years ago

UPDATE: v0.1.3

devtools::install_github("DOV-Vlaanderen/groundwater-logger-validation", subdir = "gwloggeR", ref = "0.1.3")

This version has an extra option for detect_outliers(): plot = TRUE which prints diagnostic plots. This is how they look like for INBO data.

To know more about these diagnostic plots, consult #26.

TODO: add handling of missing timestamps in case they are provided with the future optional ts argument.

mathiaswackenier commented 5 years ago

@DavorJ I would advise you to keep the data from loggers that measure air pressure separate from the loggers that measure hydrostatic pressure, as the latter don't fit the same Gaussian curve and median. The frequency plots are therefore redundant. If possible I would suggest to implement "air pressure" and "hydrostatic pressure" (as mentioned above) to keep them out of the diagnostic plots.

@fredericpiesschaert will provide you with an updated list of valid air pressure measurements. I'm curious how the diagnostic plots of these loggers will look like when you run them through your last update. Maybe it can result in a better kernel fit.

fredericpiesschaert commented 5 years ago

@DavorJ @mathiaswackenier aPrioriBaroData have been uploaded https://github.com/DOV-Vlaanderen/groundwater-logger-validation/upload/master/data/raw/aPrioriBaroData

DavorJ commented 5 years ago

UPDATE: v0.1.4

devtools::install_github("DOV-Vlaanderen/groundwater-logger-validation", subdir = "gwloggeR", ref = "0.1.4")

New stuff:

Here is an example of how to use the new functions. Assuming df holds the logger data:

gwloggeR::detect_outliers(x = df$PRESSURE_VALUE, 
                          timestamps = df$TIMESTAMP_UTC, 
                          plot = TRUE, 
                          apriori = gwloggeR::apriori('hydrostatic pressure'))

For help: ?gwloggeR::detect_outliers.

The functions can now be tested. The diagnostic plots (plot = TRUE) are for convenience based on the generic plots as in #35. I'll start writing the documentation and adjust these plots accordingly. Feel free to make suggestions or contact me if you have questions.

cc @fredericpiesschaert