KWB-R / wasserportal

R Package with Functions for Scraping Data of Wasserportal Berlin (https://wasserportal.berlin.de)
https://kwb-r.github.io/wasserportal/
MIT License
0 stars 0 forks source link

GW quality data of Wasserportal exports only two significant digits #23

Open mrustl opened 2 years ago

mrustl commented 2 years ago

see e.g. for ID=149

Data below detection limit is indicated with - sign. In case detection limit is below -0.01 it is exported by Wasserportal as -0.00

grafik 149_wasserqualitaet_all_11_05_2005.csv

Based on all available data sets I identified that approximately 6.1% (i.e. 45566 data points) of all GW quality data points need to be fixed. See below for details:

library(wasserportal)
gwq_data <- jsonlite::fromJSON("https://kwb-r.github.io/wasserportal/stations_gwq_data.json")

### Number of GW quality data points available in Wasserportal
nrow(gwq_data)
#> [1] 713955

gwq_data_tobefixed <- gwq_data %>%
  dplyr::filter(Messwert == 0,
                Einheit == "\u00B5g/l") %>%
  dplyr::count(.data$Messstellennummer) %>%
  dplyr::arrange(dplyr::desc(.data$n)) %>%
  dplyr::rename("n_samples_with_-0.00" = .data$n)

### Number of GW quality stations that need to be fixed
nrow(gwq_data_tobefixed)
#> [1] 185

### Number of data points that need to be fixed
n_tobefixed <- sum(gwq_data_tobefixed$`n_samples_with_-0.00`)

### Percent of data points that need to be fixed
100*n_tobefixed/nrow(gwq_data)
#> [1] 6.382195

knitr::kable(gwq_data_tobefixed,
             caption = paste("GW quality monitoring stations_id and number of",
             "data points that need to be fixed (i.e. increase number of digits",
             "in case of '-0.00'"))
Messstellennummer n_sampleswith\-0.00
5042 1223
5032 1204
4612 1202
7044 1202
7045 1202
7207 1202
4611 1201
7206 1200
7209 1174
5138 1137
7292 1011
6515 1007
7108 1005
6016 979
15147 823
15001 822
149 779
5366 737
6067 342
4304 292
3354 276
6058 274
6056 273
6023 272
7098 272
7171 272
6014 271
6080 270
7180 270
7295 269
6535 267
7039 267
7229 267
5003 266
5404 266
7027 266
5150 265
6510 265
6548 265
7168 265
5008 264
5049 264
5066 264
7014 264
7172 264
15101 264
5022 263
5039 263
6069 263
5026 260
6010 260
5140 258
6020 258
7215 257
7030 256
5095 253
7109 253
6121 252
5013 251
7173 251
5090 250
6017 250
5074 249
5002 247
7111 247
6038 239
7062 239
15000 237
15156 235
6963 234
7064 234
15049 234
9401 233
5297 231
5058 230
7219 230
7258 230
7291 230
15150 230
1359 229
15065 229
15152 229
4875 228
8964 228
7063 227
7195 226
4233 225
6518 225
7255 224
7257 218
4105 217
7285 217
7290 217
15153 217
4061 216
4521 216
7042 216
7286 216
6533 214
7250 210
4846 209
7136 205
4727 204
6504 204
7259 204
7268 200
5025 195
7137 192
6522 191
6066 188
6534 170
9092 168
5306 160
5130 156
8949 155
5010 151
6084 148
7264 148
17309 132
8957 128
344 122
17306 122
645 119
17303 106
580 101
8950 100
7015 90
5207 89
6028 89
7079 89
7165 88
23750 88
7019 84
7161 84
5097 83
10421 74
17304 74
5009 43
6089 28
6097 28
499 26
5020 26
5076 26
6063 26
5155 25
6026 24
6047 24
6081 24
6113 24
7018 24
7057 24
7078 24
7081 24
7084 24
7132 24
7144 24
7176 24
7210 24
7237 24
7248 24
7298 24
7301 24
282 23
5044 20
5255 20
5005 18
5027 18
5036 18
5040 18
5060 18
5073 18
5075 18
5078 18
5086 18
5351 18
5355 18
6520 18
6511 16
8472 14
3215 13
23703 9
7072 8
7086 8
8469 8
15109 8

GW quality monitoring stations_id and number of data points that need to be fixed (i.e. increase number of digits in case of ‘-0.00’

Created on 2022-06-09 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.1.2 (2021-11-01) #> os Windows 10 x64 (build 19044) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate German_Germany.1252 #> ctype German_Germany.1252 #> tz Europe/Berlin #> date 2022-06-09 #> pandoc 2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2) #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.1.3) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.3) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.3) #> data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.3) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0) #> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.1.3) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.3) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.3) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.3) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.3) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2) #> httr 1.4.3 2022-05-04 [1] CRAN (R 4.1.3) #> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.3) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.1.3) #> kwb.datetime 0.5.0 2022-06-01 [1] Github (kwb-r/kwb.datetime@5f2b2c4) #> kwb.utils 0.13.0 2022-06-08 [1] Github (kwb-r/kwb.utils@6218b79) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.3) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3) #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.1.3) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.3) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.1.0) #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.1.3) #> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.3) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.3) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.3) #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.1.3) #> wasserportal * 0.1.0.9000 2022-06-02 [1] local #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.3) #> xfun 0.31 2022-05-10 [1] CRAN (R 4.1.3) #> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.3) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2) #> #> [1] C:/Users/mrustl/Documents/R/win-library/4.1 #> [2] C:/Program Files/R/R-4.1.2/library #> #> ------------------------------------------------------------------------------ ```