DOV-Vlaanderen / groundwater-logger-validation

Analysis on validation methods for groundwater logger data
MIT License
2 stars 2 forks source link

add id to output #38

Closed fredericpiesschaert closed 5 years ago

fredericpiesschaert commented 5 years ago

@DavorJ ID's are currently not included in the output vector, you just get a true/false sequence following the order of the inputfile: [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE This is OK when you work with input-files, but when generating the input directly from the DB - as Jo is implementing it - this becomes fishy, because records can be deleted/added in the meantime. Hence, adding ID to the output is quite essential.

DavorJ commented 5 years ago

@fredericpiesschaert, I assume by ID you mean the key field from the database, not just the sequence ID from the input vector?

Suppose you have the following dataframe df taken from the database:

df <- data.frame(ID = letters[1:10], pressure = c(1033:1041, 900))
print(df, row.names = FALSE)
#>  ID pressure
#>   a     1033
#>   b     1034
#>   c     1035
#>   d     1036
#>   e     1037
#>   f     1038
#>   g     1039
#>   h     1040
#>   i     1041
#>   j      900

Then you can add the outlier column like this:

df$outlier <- gwloggeR::detect_outliers(df$pressure)
print(df, row.names = FALSE)
#>  ID pressure outlier
#>   a     1033   FALSE
#>   b     1034   FALSE
#>   c     1035   FALSE
#>   d     1036   FALSE
#>   e     1037   FALSE
#>   f     1038   FALSE
#>   g     1039   FALSE
#>   h     1040   FALSE
#>   i     1041   FALSE
#>   j      900    TRUE

Now you have the ID and the outlier field in one dataframe, which can be used for updating the DB. Does this solve the problem?

I could add an extra argument key to detect_outliers(), but this doesn't seem to be essential for the function, hence I rather not to.

What do you think?

fredericpiesschaert commented 5 years ago

@Jo-Loos could this do the trick?

fredericpiesschaert commented 5 years ago

works like a charm