S-ENDA / DMH

Data management handbook for S-ENDA partners
https://s-enda.github.io/DMH/
0 stars 2 forks source link

Access to data (without downloading it) #69

Closed frafra closed 6 months ago

frafra commented 8 months ago

GDAL supports accessing to data remotely without having to download it in advance.

Should we mention it and provide some code snippets on how to do it in Python/R?

Here is an example of using GDAL virtual file system to access a remote NetCDF file without downloading it:

url='NETCDF:/vsicurl/https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2006.nc'
data2006 = stars::read_stars(url, sub='rr', proxy=TRUE)
[...]
st_crs(data2006)
reg1_st <- reg1 %>% st_as_sf(.,
                             coords = c("longitude", "latitude"), 
                             crs=4326)

reg1_st <- st_transform(reg1_st, crs=st_crs(data2006))
reg1_st$rr <- st_extract(data2006, reg1_st, time_column="date")

This snippet uses R stars library.

Such a URL works with gdalinfo from command line and on a wide variety of software using GDAL.

Maybe Benjamin Cretois can help with that as well?

BenCretois commented 7 months ago

A more reproducible code below:

# Create a dataset - assuming the coordinates are in CRS=4326
data <- data.frame(
  longitude = c(11.95501, 11.95501, 11.95498, 11.95493, 11.95487, 11.95497),
  latitude = c(65.67812, 65.67814, 65.67821, 65.67808, 65.67809, 65.67810),
  date = as.Date(rep("2006-03-26", 6))
)

# fetch the ncdf file for a specific year
  url=paste0('NETCDF:/vsicurl/https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_', year,'.nc')

  # Read the netcdf file from the url
  netcdf_file=stars::read_stars(url, sub='rr', proxy=TRUE) 

  # Transform the dataset into an sf object
  data_st <- data %>% st_as_sf(.,
                               coords = c("longitude", "latitude"), 
                               crs=4326)

  # Reproject the CRS to match the CRS of the netcdf
  data_st <- st_transform(data_st, crs=st_crs(netcdf_file))
  data_st$rr <- st_extract(netcdf_file, data_st, time_column="date")