NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
2 stars 0 forks source link

Cropping MODIS HDF files for tests #248

Closed sigmafelix closed 1 month ago

sigmafelix commented 6 months ago

MODIS HDF files comprise subdatasets or layers, where cells are stored in different spatial resolutions. This is not working well with standard raster cropping methods as subdatasets are required to be processed separately, unlike cropping an ordinary multilayer/multiband rasters. If we use the raw HDF files for tests, dataset sizes would matter as some images exceed 100 MB that are blocked by GitHub. I will attempt to subset converted tiff/nc files with the very dataset from which we want to calculate covariates and share findings here.

May be related to #233

kyle-messier commented 6 months ago

@sigmafelix @eva0marques @mitchellmanware

Some potential options to deal with MODIS data in R:

https://docs.ropensci.org/MODIStsp/

I found this blog, a bit outdated, but still potentially useful.

https://www.r-bloggers.com/2012/11/modis-r-package-tutorial/

kyle-messier commented 6 months ago

It would be interesting to see how our download_modis.R compares to https://github.com/ropensci/MODIStsp/blob/HEAD/R/MODIStsp_download.R

sigmafelix commented 6 months ago

@Spatiotemporal-Exposures-and-Toxicology @eva0marques @mitchellmanware Thank you for the information. MODIStsp seems like not maintained now. When I tried one in GitHub, the list of products was very limited.

An alternative approach is luna which is a companion package of terra (what a clever naming!). It worked so great that we could set a custom area of interest from SpatExtent in terra. It does not post-process hdf files to make a spatial subset, but download the hdf files with given arguments like below:

# luna is only available in GitHub for now
remotes::install_github("rspatial/luna")

library(terra)
library(luna)

# Assume that we already installed sf in the system
ncpath <- system.file("shape/nc.shp", package = "sf")
nc <- terra::vect(ncpath)
nc <- terra::project(nc, "EPSG:4326")
dw <- nc[grep("^(Dur|Wake|Orange)", nc$NAME), ]
dwext <- terra::ext(dw) + 0.05

# MOD06 product lists. Version and server names
# should be used accordingly
luna::getProducts("MOD06")
# Options for this product:
#         provider             concept_id short_name version
# 25624 LANCEMODIS C1219032640-LANCEMODIS   MOD06_L2    6NRT
# 38071 LANCEMODIS C1426500206-LANCEMODIS   MOD06_L2  6.1NRT
# 39963      LAADS      C1443535037-LAADS   MOD06_L2     6.1
# 55193      LAADS       C203234497-LAADS   MOD06_L2       6

luna::getNASA(
  product="MOD06_L2",
  start_date="2021-03-21",
  end_date="2021-03-22",
  aoi = dwext,
  version="6.1",
  download=TRUE,
  path="tests/testdata",
  username="sigmafelix",
  password=readLines("~/.edtoken")[1],
  server="LAADS",
  limit=100000,
  overwrite=TRUE
)

#  |======================================================================| 100%
#  |======================================================================| 100%

# [1] "tests/testdata/MOD06_L2.A2021080.0250.061.2021268135055.hdf"
# [2] "tests/testdata/MOD06_L2.A2021080.1630.061.2021268140321.hdf"
sigmafelix commented 6 months ago

Keeping MODIS HDF metadata that are used in calculation functions would be a solution since unused metadata elements have nothing to do with calculation functions. In my recent push in isong_calc_covars branch, I heavily refactored calc_modis and auxiliary functions to improve flexibility, such that we could only retain key elements in subset files.

sigmafelix commented 6 months ago

As a side note, luna appears not to support VNP46A2 (gap-filled night light). MOD and MCD products are supported.

sigmafelix commented 6 months ago

OPeNDAP supports downloading subsets, but seems to malfunction to miss DIMVAR, which is seen to assign unsorted data elements to actual spatial dimensions. Curvilinear grids are still tricky to make subsets as they should be warped / rectified before use. GDAL CLI (i.e., gdal_translate) keeps no subdataset structures in converted files and HDF4/5 drivers are usually not included in standard distributions, which means we should compile our own GDAL.

I will look at assigning DIMVAR at NetCDF level later, but it is in low priority.

kyle-messier commented 1 month ago

@sigmafelix please close if addressed in updates to amadeus. 👍