Open CeciliaNilsson709 opened 4 months ago
@adokter @iskandari @bart1 @CeciliaNilsson709 I think we need a roadmap to tackle this, since the VPTS format is now half supported in bioRad:
as.data.frame()
currently has options (suntime
and geo
) resulting in different flavour of data frames. As described above, when using write.csv()
you no longer end up with a VPTS CSV. This could be fixed with a write_vpts()
function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.My 2 cents is that we should move towards this situation:
- as.data.frame() currently has options (suntime and geo) resulting in different flavour of data frames. As described above, when using write.csv() you no longer end up with a VPTS CSV. This could be fixed with a write_vpts() function (removing those columns), no longer providing those options and/or having vpts objects be data.frames.
Agreed that it would be simpler to have vpts objects as data frames, but then a side effect is that we lose metadata relevant to vp file creation. From the example in #653:
vpts_hdf5<- bind_into_vpts(read_vpfiles(c(hdf5_1, hdf5_2)))
vpts_hdf5$attributes$how$task_args
[1] "azimMax=360.000000,azimMin=0.000000,layerThickness=200.000000,nLayers=25,rangeMax=35000.000000,rangeMin=5000.000000,elevMax=90.000000,elevMin=0.000000,radarWavelength=5.300000,useClutterMap=0,clutterMap=,fitVrad=1,exportBirdProfileAsJSONVar=0,minNyquist=5.000000,maxNyquistDealias=25.000000,birdRadarCrossSection=11.000000,stdDevMinBird=2.000000,cellEtaMin=11500.000000,etaMax=36000.000000,dbzType=DBZH,requireVrad=0,dealiasVrad=1,dealiasRecycle=1,dualPol=0,singlePol=1,rhohvThresMin=0.950000,resample=0,resampleRscale=500.000000,resampleNbins=100,resampleNrays=360,mistNetNElevs=5,mistNetElevsOnly=1,useMistNet=0,mistNetPath=/MistNet/mistnet_nexrad.pt,areaCellMin=0.500000,cellClutterFractionMax=0.500000,chisqMin=0.000010,clutterValueMin=0.100000,dbzThresMin=0.000000,fringeDist=5000.000000,nBinsGap=8,nPointsIncludedMin=25,nNeighborsMin=5,nObsGapMin=5,nAzimNeighborhood=3,nRangNeighborhood=3,nCountMin=4,refracIndex=0.964000,cellStdDevMax=5.000000,absVDifMax=10.000000,vradMin=1.000000"
This irreversibility should be communicated clearly. Maybe vpts data frames should be able to retain other information, but then we implement a write_vpts()
that is called, for example, by extending write.csv(format='vpts')
You can add custom attributes to data.frames, which we could use for the vpts metadata. These attributes are not lost when using other functions, like dplyr's filter()
. Note: I'm not very familiar with attributes, but it seems useful here.
library(dplyr)
# A dataframe has default attributes
df <- iris
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150
# One can add custom attributes (here the list "metadata")
metadata <- list(radar = "bejab", regular = FALSE)
attr(df, "metadata") <- metadata
attributes(df)
#> $names
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
#> [145] 145 146 147 148 149 150
#>
#> $metadata
#> $metadata$radar
#> [1] "bejab"
#>
#> $metadata$regular
#> [1] FALSE
# If the data frame is handled by other functions, the attributes are retained
df %>%
filter(Species == "virginica") %>%
attr("metadata")
#> $radar
#> [1] "bejab"
#>
#> $regular
#> [1] FALSE
Created on 2024-02-20 with reprex v2.1.0
In addition, we should add a vpts
class to the data.frame, so it is easy to recognize:
library(dplyr)
df <- iris
class(df) <- c("vpts", class(df))
class(df)
#> [1] "vpts" "data.frame"
# Class is retained by dplyr
df %>%
filter(Species == "virginica") %>%
class()
#> [1] "vpts" "data.frame"
# Class can also be added to a tibble
df <- iris
dft <- as_tibble(df)
class(dft) <- c("vpts", class(dft))
class(dft)
#> [1] "vpts" "tbl_df" "tbl" "data.frame"
# Class is retained by dplyr
dft %>%
filter(Species == "virginica") %>%
class()
#> [1] "vpts" "tbl_df" "tbl" "data.frame"
Created on 2024-02-20 with reprex v2.1.0
A few remarks from my side. For move2
I pretty much do everything in the way @peterdesmet describes (based on sf
). Some extra properties are retained as attributes that works quite well and you can update these when required using custom methods. For that it is indeed important to add the vpts
class on top of a data frame. For sf
/move2
it does not matter if the underlying data.frame
is a tbl
or a real data.frame
. This helps as sometimes tbl
are considerably faster for example when reading using readr
or vroom
.
This is also quite interesting as a read for restoring objects after dplyr
operations: https://dplyr.tidyverse.org/reference/dplyr_extending.html
Maybe if it requires too many breaking changes you could also call it vpts_df
Related to #653 and #635, there is still a descrepancy between the VPTS CSV format and how a VPTS dataframe looks like in bioRad. As a result, writing a VPTS dataframe to CSV and then reading it with
read_vpts()
doesn't work (the columns are not the same).