VPTS object has duplicate columns `dbz_all` and `DBZH`, with incorrect repeating values

peterdesmet commented 10 months ago

When reading a VPTS CSV with read_vpts() the data frame has the columns dbz_all and DBZH, which are alternative names for the same thing. According to VPTS CSV, dbz_all is the preferred column name, so DBZH should technically be removed.

library(bioRad)
vpts <- read_vpts("https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/daily/dkste/2023/dkste_vpts_20230903.csv")
vpts_df <- as.data.frame(vpts)
colnames(vpts_df)
#>  [1] "radar"            "datetime"         "height"           "u"               
#>  [5] "v"                "w"                "ff"               "dd"              
#>  [9] "sd_vvp"           "gap"              "eta"              "dens"            
#> [13] "dbz"              "dbz_all"          "n"                "DBZH"            
#> [17] "n_dbz"            "n_all"            "n_dbz_all"        "vcp"             
#> [21] "rcs"              "sd_vvp_threshold" "radar_latitude"   "radar_longitude" 
#> [25] "radar_height"     "radar_wavelength" "day"              "sunrise"         
#> [29] "sunset"

^{Created on 2023-09-12 with reprex v2.0.2}

In addition, the values for the dbz_all (and DBZH) columns is incorrect. The first value (-18.77684) is repeated for all rows:

library(bioRad)
library(dplyr)
library(readr)
file <- "https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/daily/dkste/2023/dkste_vpts_20230903.csv"

vpts_biorad <- as.data.frame(read_vpts(file))
vpts_biorad %>%
  head(30) %>%
  select(datetime, height, dbz_all, DBZH)
#>               datetime height   dbz_all      DBZH
#> 1  2023-09-03 00:00:00      0 -18.77684 -18.77684
#> 2  2023-09-03 00:00:00    200 -18.77684 -18.77684
#> 3  2023-09-03 00:00:00    400 -18.77684 -18.77684
#> 4  2023-09-03 00:00:00    600 -18.77684 -18.77684
#> 5  2023-09-03 00:00:00    800 -18.77684 -18.77684
#> 6  2023-09-03 00:00:00   1000 -18.77684 -18.77684
#> 7  2023-09-03 00:00:00   1200 -18.77684 -18.77684
#> 8  2023-09-03 00:00:00   1400 -18.77684 -18.77684
#> 9  2023-09-03 00:00:00   1600 -18.77684 -18.77684
#> 10 2023-09-03 00:00:00   1800 -18.77684 -18.77684
#> 11 2023-09-03 00:00:00   2000 -18.77684 -18.77684
#> 12 2023-09-03 00:00:00   2200 -18.77684 -18.77684
#> 13 2023-09-03 00:00:00   2400 -18.77684 -18.77684
#> 14 2023-09-03 00:00:00   2600 -18.77684 -18.77684
#> 15 2023-09-03 00:00:00   2800 -18.77684 -18.77684
#> 16 2023-09-03 00:00:00   3000 -18.77684 -18.77684
#> 17 2023-09-03 00:00:00   3200 -18.77684 -18.77684
#> 18 2023-09-03 00:00:00   3400 -18.77684 -18.77684
#> 19 2023-09-03 00:00:00   3600 -18.77684 -18.77684
#> 20 2023-09-03 00:00:00   3800 -18.77684 -18.77684
#> 21 2023-09-03 00:00:00   4000 -18.77684 -18.77684
#> 22 2023-09-03 00:00:00   4200 -18.77684 -18.77684
#> 23 2023-09-03 00:00:00   4400 -18.77684 -18.77684
#> 24 2023-09-03 00:00:00   4600 -18.77684 -18.77684
#> 25 2023-09-03 00:00:00   4800 -18.77684 -18.77684
#> 26 2023-09-03 00:05:00      0 -18.77684 -18.77684
#> 27 2023-09-03 00:05:00    200 -18.77684 -18.77684
#> 28 2023-09-03 00:05:00    400 -18.77684 -18.77684
#> 29 2023-09-03 00:05:00    600 -18.77684 -18.77684
#> 30 2023-09-03 00:05:00    800 -18.77684 -18.77684

# Compare with directly reading the data
vpts_readr <- readr::read_csv(file, col_types = cols(.default = "c"))
vpts_readr %>%
  head(30) %>%
  select(datetime, height, dbz_all)
#> # A tibble: 30 × 3
#>    datetime             height dbz_all            
#>    <chr>                <chr>  <chr>              
#>  1 2023-09-03T00:00:00Z 0      -18.77684211730957 
#>  2 2023-09-03T00:00:00Z 200    -20.96808624267578 
#>  3 2023-09-03T00:00:00Z 400    -25.368946075439453
#>  4 2023-09-03T00:00:00Z 600    -28.84183692932129 
#>  5 2023-09-03T00:00:00Z 800    -36.81539535522461 
#>  6 2023-09-03T00:00:00Z 1000   -inf               
#>  7 2023-09-03T00:00:00Z 1200   -inf               
#>  8 2023-09-03T00:00:00Z 1400   -inf               
#>  9 2023-09-03T00:00:00Z 1600   -inf               
#> 10 2023-09-03T00:00:00Z 1800   -inf               
#> # ℹ 20 more rows

^{Created on 2023-09-12 with reprex v2.0.2}

iskandari commented 8 months ago

Added conditional logic on what to name reflectivity based on how as.vpts is called. If it is called by read_vpts_csv() from_csv argument is set to TRUE, otherwise it will be set to FALSE. When FALSE, dbz_all will be renamed as DBZH

See #640

adokter commented 8 months ago

closed by #640

adokter / bioRad

VPTS object has duplicate columns `dbz_all` and `DBZH`, with incorrect repeating values #634