`read_vpts()` cannot read bioRad created CSV

CeciliaNilsson709 commented 4 months ago

Related to #653 and #635, there is still a descrepancy between the VPTS CSV format and how a VPTS dataframe looks like in bioRad. As a result, writing a VPTS dataframe to CSV and then reading it with read_vpts() doesn't work (the columns are not the same).

library(bioRad)
#> Welcome to bioRad version 0.7.3
#> using vol2birdR version 1.0.1 (MistNet not installed)
vpts_df <- as.data.frame(example_vpts)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `day`, `sunrise`, `sunset`

vpts_df <- as.data.frame(example_vpts, geo =FALSE, suntime = FALSE)
write.csv(vpts_df, "vpts_df.csv")
read_vpts("vpts_df.csv")
#> New names:
#> • `` -> `...1`
#> Error: Field names in `schema` must match column names in data:
#> ℹ Field names: `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `eta`, `dens`, `dbz`, `dbz_all`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`, `vcp`, `radar_latitude`, `radar_longitude`, `radar_height`, `radar_wavelength`, `source_file`
#> ℹ Column names: `...1`, `radar`, `datetime`, `height`, `u`, `v`, `w`, `ff`, `dd`, `sd_vvp`, `gap`, `dbz`, `eta`, `dens`, `DBZH`, `n`, `n_dbz`, `n_all`, `n_dbz_all`, `rcs`, `sd_vvp_threshold`
Created on 2024-02-19 with reprex v2.1.0

peterdesmet commented 4 months ago

@adokter @iskandari @bart1 @CeciliaNilsson709 I think we need a roadmap to tackle this, since the VPTS format is now half supported in bioRad:

Reading from hdf5 or csv results in different vpts objects (#653). This should be patched.
The columns in a bioRad vpts object are not consistent with the VPTS CSV format. I think we should have one format throughout. This is a major change.
Ideally the change to one format through is combined with making bioRad vpts objects a data.frame directly (#568). This is a major change.
as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

My 2 cents is that we should move towards this situation:

functions

iskandari commented 4 months ago

as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.

Agreed that it would be simpler to have vpts objects as data frames, but then a side effect is that we lose metadata relevant to vp file creation. From the example in #653:

vpts_hdf5<- bind_into_vpts(read_vpfiles(c(hdf5_1, hdf5_2)))
vpts_hdf5$attributes$how$task_args
[1] "azimMax=360.000000,azimMin=0.000000,layerThickness=200.000000,nLayers=25,rangeMax=35000.000000,rangeMin=5000.000000,elevMax=90.000000,elevMin=0.000000,radarWavelength=5.300000,useClutterMap=0,clutterMap=,fitVrad=1,exportBirdProfileAsJSONVar=0,minNyquist=5.000000,maxNyquistDealias=25.000000,birdRadarCrossSection=11.000000,stdDevMinBird=2.000000,cellEtaMin=11500.000000,etaMax=36000.000000,dbzType=DBZH,requireVrad=0,dealiasVrad=1,dealiasRecycle=1,dualPol=0,singlePol=1,rhohvThresMin=0.950000,resample=0,resampleRscale=500.000000,resampleNbins=100,resampleNrays=360,mistNetNElevs=5,mistNetElevsOnly=1,useMistNet=0,mistNetPath=/MistNet/mistnet_nexrad.pt,areaCellMin=0.500000,cellClutterFractionMax=0.500000,chisqMin=0.000010,clutterValueMin=0.100000,dbzThresMin=0.000000,fringeDist=5000.000000,nBinsGap=8,nPointsIncludedMin=25,nNeighborsMin=5,nObsGapMin=5,nAzimNeighborhood=3,nRangNeighborhood=3,nCountMin=4,refracIndex=0.964000,cellStdDevMax=5.000000,absVDifMax=10.000000,vradMin=1.000000"

This irreversibility should be communicated clearly. Maybe vpts data frames should be able to retain other information, but then we implement a write_vpts() that is called, for example, by extending write.csv(format='vpts')

peterdesmet commented 4 months ago

You can add custom attributes to data.frames, which we could use for the vpts metadata. These attributes are not lost when using other functions, like dplyr's filter(). Note: I'm not very familiar with attributes, but it seems useful here.

library(dplyr)

# A dataframe has default attributes
df <- iris
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150

# One can add custom attributes (here the list "metadata")
metadata <- list(radar = "bejab", regular = FALSE)
attr(df, "metadata") <- metadata
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150
#> 
#> $metadata
#> $metadata$radar
#> [1] "bejab"
#> 
#> $metadata$regular
#> [1] FALSE

# If the data frame is handled by other functions, the attributes are retained
df %>%
  filter(Species == "virginica") %>%
  attr("metadata")
#> $radar
#> [1] "bejab"
#> 
#> $regular
#> [1] FALSE

^{Created on 2024-02-20 with reprex v2.1.0}

In addition, we should add a vpts class to the data.frame, so it is easy to recognize:

library(dplyr)
df <- iris
class(df) <- c("vpts", class(df))
class(df)
#> [1] "vpts"       "data.frame"

# Class is retained by dplyr
df %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "data.frame"

# Class can also be added to a tibble
df <- iris
dft <- as_tibble(df)
class(dft) <- c("vpts", class(dft))
class(dft)
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

# Class is retained by dplyr
dft %>%
  filter(Species == "virginica") %>%
  class()
#> [1] "vpts"       "tbl_df"     "tbl"        "data.frame"

^{Created on 2024-02-20 with reprex v2.1.0}

bart1 commented 4 months ago

A few remarks from my side. For move2 I pretty much do everything in the way @peterdesmet describes (based on sf). Some extra properties are retained as attributes that works quite well and you can update these when required using custom methods. For that it is indeed important to add the vpts class on top of a data frame. For sf/move2 it does not matter if the underlying data.frame is a tbl or a real data.frame. This helps as sometimes tbl are considerably faster for example when reading using readr or vroom.

This is also quite interesting as a read for restoring objects after dplyr operations: https://dplyr.tidyverse.org/reference/dplyr_extending.html

Maybe if it requires too many breaking changes you could also call it vpts_df

adokter / bioRad

`read_vpts()` cannot read bioRad created CSV #654