adokter / bioRad

R package for analysis and visualisation of biological signals in weather radar data
http://adokter.github.io/bioRad
Other
29 stars 16 forks source link

Consider using stars for the array/gridded data #255

Open bart1 opened 5 years ago

bart1 commented 5 years ago

As the bioRad package uses a lot of gridded data that mostly has a spatial component to it, it might be worth considering to use the stars package.

The biggest challenge would be presenting the polar volume data that is produced by a radar. This data from radar scans has rather different dimensions from most spatial data. Each polar volume consist of three dimensions, the distance from the radar (range gate), the direction from the radar and the elevation angle.

The question is how this is best presented, easiest would to set the dimensions as follows:

require(bioRad)
#> Welcome to bioRad version 0.4.0.9243
require(stars)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
require(ggplot2)
#> Loading required package: ggplot2
pvol <- system.file("extdata", "volume.h5", package = "bioRad")
a<-read_pvolfile(pvol)
require(units)
#> Loading required package: units
#> udunits system database from /usr/share/xml/udunits
pvl<-c(read_stars(pvol, sub=1:5), 
       read_stars(pvol, sub=6:10),
       read_stars(pvol, sub=11:15), 
       along=list(elevationAngle=as_units(unlist(lapply(a$scans, function(x)x$attributes$where$elangle)), 'degrees')))
#> //dataset1/data1/data, //dataset1/data2/data, //dataset1/data3/data, //dataset1/data4/data, //dataset1/data5/data, 
#> //dataset2/data1/data, //dataset2/data2/data, //dataset2/data3/data, //dataset2/data4/data, //dataset2/data5/data, 
#> //dataset3/data1/data, //dataset3/data2/data, //dataset3/data3/data, //dataset3/data4/data, //dataset3/data5/data,
pvl<-st_set_dimensions(pvl,'x', values = as_units(c(1:480 *500), 'm'), names = 'distance')
pvl<-st_set_dimensions(pvl,'y', values = as_units(c(1:360), 'degrees'), names = 'direction')
pvl<-st_redimension(pvl, st_dimensions(pvl)[1:3])
names(pvl)<- names(a$scans[[1]]$params)
pvl[['DBZH']]<-as_units(pvl$DBZH,'db')
pvl[['VRADH']]<-as_units(pvl$VRADH,'m/s')
pvl
#> stars object with 3 dimensions and 5 attributes
#> attribute(s):
#>    DBZH [db]       VRADH [m/s]        RHOHV           ZDR       
#>  Min.   : 0.000   Min.   :  0.0   Min.   :   0   Min.   :  1.0  
#>  1st Qu.: 0.000   1st Qu.:  0.0   1st Qu.:2383   1st Qu.:157.0  
#>  Median : 0.000   Median :  0.0   Median :2981   Median :188.0  
#>  Mean   : 6.663   Mean   : 10.1   Mean   :3401   Mean   :189.3  
#>  3rd Qu.: 0.000   3rd Qu.:  0.0   3rd Qu.:4039   3rd Qu.:228.0  
#>  Max.   :94.000   Max.   :254.0   Max.   :9999   Max.   :254.0  
#>      PHIDP      
#>  Min.   :  1.0  
#>  1st Qu.: 86.0  
#>  Median :105.0  
#>  Mean   :107.4  
#>  3rd Qu.:125.0  
#>  Max.   :254.0  
#> dimension(s):
#>                from  to  offset   delta  refsys point values
#> distance          1 480 500 [m] 500 [m] udunits    NA   NULL
#> direction         1 360   1 [°]   1 [°] udunits    NA   NULL
#> elevationAngle    1   3 0.5 [°]   1 [°] udunits    NA   NULL

The disadvantage of this would be that no spatial operations can be done on the data since the stars object does not have spatial dimensions. Alternatively one could use a curvilinear grid, e.g.:

plot(project_as_ppi(a$scans[[1]]),param = 'DBZH')

# note the next step is a very rough proxy and not actually the right calculation
pts<-geosphere::destPoint(c(a$geo$lon, a$geo$lat), rep(1:360, 480),rep(1:480 *500, each=360))
curv<-list(x=matrix(pts[,1], ncol=360, byrow=T), y=matrix(pts[,2], ncol=360, byrow = T))
aa<-read_stars(pvol,sub=1:5,  curvilinear = curv )
#> //dataset1/data1/data, //dataset1/data2/data, //dataset1/data3/data, //dataset1/data4/data, //dataset1/data5/data,
plot(aa[,1:200], as_points=F, pch=19, cex=.1 , axes=T,  border=NA, reset = F)
maps::map('world', add = TRUE, col = 'red')

The problem with this approach is that each elevation angle/scan needs a separate stars object (or does something like a 3d curvilinear grid exist?). Because each scan has a different curvilinear grid as different angle results in different distances. This would however also directly "solve" the problem that some radars have different settings for different scans (different distance categories, different number of distance bins), otherwise this directly produces reading problems.

A third option would be to see if a different projection would be able to describe the data better, maybe there is some kind of polar projection that is based on distances and the two angles. In this case the center of the projection would be set at the radar. I do not have directly an example at hand but I think this route might exist.

bart1 commented 5 years ago

@edzer do you have any insights in which route might be preferable? Thanks in advance

edzer commented 5 years ago

Thanks for the great example, and for using stars for this - I hadn't given this type of data much thought when I started the project!

The first representation is indeed the way the data are collected: as an array along distance, direction and elevationAngle dimensions, relative to the center of observation. The second uses the location of the radar to compute coordinates that can be related to Earth coordinates. I agree that doing this for ground surface (and not for the middle of the first angle, which seems to be 1 degree) "invalidates" this for larger elevationAngle "layers", but I wonder how large the error is since the angles are so small. If the purpose is plotting, would you be able to see the differences?

So I think you succeeded on both accounts: getting the raw data into a stars object, which then is of course useless without the radar location (except for plotting it around (0,0)), and creating objects relative to a known CRS. The question is what you'd like to do with these data, and how you'd like to reach your goals using stars.

Based on your sketches above, it would be fairly straightforward to write an st_as_stars method, creating either the polar or the grid form.

edzer commented 5 years ago

@mdsumner

edzer commented 5 years ago

This commit allows it to go to/from the radial curvilinear grid and the regular (square) grid, using stars:

require(bioRad)
#> Welcome to bioRad version 0.4.0.9243
require(stars)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
require(ggplot2)
#> Loading required package: ggplot2
pvol <- system.file("extdata", "volume.h5", package = "bioRad")
a<-read_pvolfile(pvol)
require(units)
#> Loading required package: units
#> udunits system database from /usr/share/xml/udunits
pvl<-c(read_stars(pvol, sub=1:5), 
       read_stars(pvol, sub=6:10),
       read_stars(pvol, sub=11:15), 
       along=list(elevationAngle=as_units(unlist(lapply(a$scans, function(x)x$attributes$where$elangle)), 'degrees')))
#> //dataset1/data1/data, //dataset1/data2/data, //dataset1/data3/data, //dataset1/data4/data, //dataset1/data5/data, 
#> //dataset2/data1/data, //dataset2/data2/data, //dataset2/data3/data, //dataset2/data4/data, //dataset2/data5/data, 
#> //dataset3/data1/data, //dataset3/data2/data, //dataset3/data3/data, //dataset3/data4/data, //dataset3/data5/data,
pvl<-st_set_dimensions(pvl,'x', values = as_units(c(1:480 *500), 'm'), names = 'distance')
pvl<-st_set_dimensions(pvl,'y', values = as_units(c(1:360), 'degrees'), names = 'direction')
pvl<-st_redimension(pvl, st_dimensions(pvl)[1:3])
names(pvl)<- names(a$scans[[1]]$params)
pvl[['DBZH']]<-as_units(pvl$DBZH,'db')
pvl[['VRADH']]<-as_units(pvl$VRADH,'m/s')
pvl

# note the next step is a very rough proxy and not actually the right calculation
pts<-geosphere::destPoint(c(a$geo$lon, a$geo$lat), rep(1:360, 480),rep(1:480 *500, each=360))
curv<-list(x=matrix(pts[,1], ncol=360, byrow=T), y=matrix(pts[,2], ncol=360, byrow = T))
aa<-read_stars(pvol,sub=1:5,  curvilinear = curv )
#> //dataset1/data1/data, //dataset1/data2/data, //dataset1/data3/data, //dataset1/data4/data, //dataset1/data5/data,
#plot(aa[,1:200], as_points=F, pch=19, cex=.1 , axes=T,  border=NA, reset = F)
#maps::map('world', add = TRUE, col = 'red')

aa.sf = st_as_sf(aa[,1:200], as_points=FALSE)
# plot(aa.sf[1], border = NA)
dim(aa.sf)
dim(aa)
aa.sf$ID = seq_len(prod(dim(aa[,1:200])))
aa.sf
x = st_as_stars(st_bbox(aa[,1:210]))
# x = st_as_stars(st_bbox(aa[,1:200]))
x$values = array(1:prod(dim(x)), dim = dim(x))
j = st_join(x, aa.sf, what = "left1", as_points = TRUE)
plot(j[2], border = NA)
k = st_join(x, aa.sf, what = "left1", as_points = FALSE)
plot(k[2], border = NA)
l = st_join(x, aa.sf, what = "inner", as_points = TRUE)
plot(l["values"], border = NA)
m = st_join(x, aa.sf, what = "inner", as_points = FALSE)
plot(m["values"], border = NA)
adokter commented 5 years ago

Thanks very much @edzer, we'll be looking into this!

bart1 commented 1 year ago

Also see #545