NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
4 stars 0 forks source link

Simple script to read in MERRA2 data #3

Open kyle-messier opened 1 year ago

sigmafelix commented 1 year ago

Following today's meeting, I suggest running sf::st_drivers(), which returns a data.frame of GDAL drivers installed in your machine. In my case (R 4.3.1, Ubuntu 22.04),

name long_name write copy is_raster is_vector vsi
ESRIC Esri Compact Cache FALSE FALSE TRUE TRUE TRUE
FITS Flexible Image Transport System TRUE FALSE TRUE TRUE FALSE
PCIDSK PCIDSK Database File TRUE FALSE TRUE TRUE TRUE
netCDF Network Common Data Format TRUE TRUE TRUE TRUE TRUE
PDS4 NASA Planetary Data System 4 TRUE TRUE TRUE TRUE TRUE
VICAR MIPL VICAR file TRUE TRUE TRUE TRUE TRUE
JP2OpenJPEG JPEG-2000 driver based on OpenJPEG library FALSE TRUE TRUE TRUE TRUE
PDF Geospatial PDF TRUE TRUE TRUE TRUE TRUE
MBTiles MBTiles TRUE TRUE TRUE TRUE TRUE
BAG Bathymetry Attributed Grid TRUE TRUE TRUE TRUE TRUE
EEDA Earth Engine Data API FALSE FALSE FALSE TRUE FALSE
OGCAPI OGCAPI FALSE FALSE TRUE TRUE TRUE
ESRI Shapefile ESRI Shapefile TRUE FALSE FALSE TRUE TRUE
MapInfo File MapInfo File TRUE FALSE FALSE TRUE TRUE
UK .NTF UK .NTF FALSE FALSE FALSE TRUE TRUE
LVBAG Kadaster LV BAG Extract 2.0 FALSE FALSE FALSE TRUE TRUE
OGR_SDTS SDTS FALSE FALSE FALSE TRUE TRUE
S57 IHO S-57 (ENC) TRUE FALSE FALSE TRUE TRUE
DGN Microstation DGN TRUE FALSE FALSE TRUE TRUE
OGR_VRT VRT - Virtual Datasource FALSE FALSE FALSE TRUE TRUE
REC EPIInfo .REC FALSE FALSE FALSE TRUE FALSE
Memory Memory TRUE FALSE FALSE TRUE FALSE
CSV Comma Separated Value (.csv) TRUE FALSE FALSE TRUE TRUE
NAS NAS - ALKIS FALSE FALSE FALSE TRUE TRUE
GML Geography Markup Language (GML) TRUE FALSE FALSE TRUE TRUE
GPX GPX TRUE FALSE FALSE TRUE TRUE
LIBKML Keyhole Markup Language (LIBKML) TRUE FALSE FALSE TRUE TRUE
KML Keyhole Markup Language (KML) TRUE FALSE FALSE TRUE TRUE
GeoJSON GeoJSON TRUE FALSE FALSE TRUE TRUE
GeoJSONSeq GeoJSON Sequence TRUE FALSE FALSE TRUE TRUE
ESRIJSON ESRIJSON FALSE FALSE FALSE TRUE TRUE
TopoJSON TopoJSON FALSE FALSE FALSE TRUE TRUE
Interlis 1 Interlis 1 TRUE FALSE FALSE TRUE TRUE
Interlis 2 Interlis 2 TRUE FALSE FALSE TRUE TRUE
OGR_GMT GMT ASCII Vectors (.gmt) TRUE FALSE FALSE TRUE TRUE
GPKG GeoPackage TRUE TRUE TRUE TRUE TRUE
SQLite SQLite / Spatialite TRUE FALSE FALSE TRUE TRUE
ODBC FALSE FALSE FALSE TRUE FALSE
WAsP WAsP .map format TRUE FALSE FALSE TRUE TRUE
PGeo ESRI Personal GeoDatabase FALSE FALSE FALSE TRUE FALSE
MSSQLSpatial Microsoft SQL Server Spatial Database TRUE FALSE FALSE TRUE FALSE
OGR_OGDI OGDI Vectors (VPF, VMAP, DCW) FALSE FALSE FALSE TRUE FALSE
PostgreSQL PostgreSQL/PostGIS TRUE FALSE FALSE TRUE FALSE
MySQL MySQL TRUE FALSE FALSE TRUE FALSE
OpenFileGDB ESRI FileGDB FALSE FALSE FALSE TRUE TRUE
DXF AutoCAD DXF TRUE FALSE FALSE TRUE TRUE
CAD AutoCAD Driver FALSE FALSE TRUE TRUE TRUE
FlatGeobuf FlatGeobuf TRUE FALSE FALSE TRUE TRUE
Geoconcept Geoconcept TRUE FALSE FALSE TRUE TRUE
GeoRSS GeoRSS TRUE FALSE FALSE TRUE TRUE
GPSTrackMaker GPSTrackMaker TRUE FALSE FALSE TRUE TRUE
VFK Czech Cadastral Exchange Data Format FALSE FALSE FALSE TRUE FALSE
PGDUMP PostgreSQL SQL dump TRUE FALSE FALSE TRUE TRUE
OSM OpenStreetMap XML and PBF FALSE FALSE FALSE TRUE TRUE
GPSBabel GPSBabel TRUE FALSE FALSE TRUE FALSE
OGR_PDS Planetary Data Systems TABLE FALSE FALSE FALSE TRUE TRUE
WFS OGC WFS (Web Feature Service) FALSE FALSE FALSE TRUE TRUE
OAPIF OGC API - Features FALSE FALSE FALSE TRUE FALSE
SOSI Norwegian SOSI Standard FALSE FALSE FALSE TRUE FALSE
Geomedia Geomedia .mdb FALSE FALSE FALSE TRUE FALSE
EDIGEO French EDIGEO exchange format FALSE FALSE FALSE TRUE TRUE
SVG Scalable Vector Graphics FALSE FALSE FALSE TRUE TRUE
CouchDB CouchDB / GeoCouch TRUE FALSE FALSE TRUE FALSE
Cloudant Cloudant / CouchDB TRUE FALSE FALSE TRUE FALSE
Idrisi Idrisi Vector (.vct) FALSE FALSE FALSE TRUE TRUE
ARCGEN Arc/Info Generate FALSE FALSE FALSE TRUE TRUE
XLS MS Excel format FALSE FALSE FALSE TRUE FALSE
ODS Open Document/ LibreOffice / OpenOffice Spreadsheet TRUE FALSE FALSE TRUE TRUE
XLSX MS Office Open XML spreadsheet TRUE FALSE FALSE TRUE TRUE
Elasticsearch Elastic Search TRUE FALSE FALSE TRUE FALSE
Walk FALSE FALSE FALSE TRUE FALSE
Carto Carto TRUE FALSE FALSE TRUE FALSE
AmigoCloud AmigoCloud TRUE FALSE FALSE TRUE FALSE
SXF Storage and eXchange Format FALSE FALSE FALSE TRUE TRUE
Selafin Selafin TRUE FALSE FALSE TRUE TRUE
JML OpenJUMP JML TRUE FALSE FALSE TRUE TRUE
PLSCENES Planet Labs Scenes API FALSE FALSE TRUE TRUE FALSE
CSW OGC CSW (Catalog Service for the Web) FALSE FALSE FALSE TRUE FALSE
VDV VDV-451/VDV-452/INTREST Data Format TRUE FALSE FALSE TRUE TRUE
GMLAS Geography Markup Language (GML) driven by application schemas FALSE TRUE FALSE TRUE TRUE
MVT Mapbox Vector Tiles TRUE FALSE FALSE TRUE TRUE
NGW NextGIS Web TRUE TRUE TRUE TRUE FALSE
MapML MapML TRUE FALSE FALSE TRUE TRUE
TIGER U.S. Census TIGER/Line TRUE FALSE FALSE TRUE TRUE
AVCBin Arc/Info Binary Coverage FALSE FALSE FALSE TRUE TRUE
AVCE00 Arc/Info E00 (ASCII) Coverage FALSE FALSE FALSE TRUE TRUE
HTTP HTTP Fetching Wrapper FALSE FALSE TRUE TRUE FALSE

NetCDF has all TRUEs across all logical columns, and I did not see errors when I open a MERRA2 file using

merra2_test = terra::rast("./MERRA2_400.inst3_3d_asm_Np.20230627.nc4")
> merra2_test
# class       : SpatRaster 
# dimensions  : 361, 576, 3720  (nrow, ncol, nlyr)
# resolution  : 0.625, 0.5  (x, y)
# extent      : -180.3125, 179.6875, -90.25, 90.25  (xmin, xmax, ymin, ymax)
# coord. ref. :  
# sources     : MERRA2_400.inst3_3d_asm_Np.20230627.nc4:EPV  (336 layers) 
#               MERRA2_400.inst3_3d_asm_Np.20230627.nc4:H  (336 layers) 
#               MERRA2_400.inst3_3d_asm_Np.20230627.nc4:O3  (336 layers) 
#               ... and 11 more source(s)
# varnames    : EPV (ertels_potential_vorticity) 
#               H (edge_heights) 
#               O3 (ozone_mass_mixing_ratio) 
#               ...
# names       :    EPV_l~000_1,    EPV_l~975_1,    EPV_l~950_1,    EPV_l~925_1,    EPV_l~900_1,    EPV_l~875_1, ... 
# unit        : K m+2 kg-1 s-1, K m+2 kg-1 s-1, K m+2 kg-1 s-1, K m+2 kg-1 s-1, K m+2 kg-1 s-1, K m+2 kg-1 s-1, ... 
# time (raw)  : 0 to 1260 

Maybe the issue is related to GDAL for MacOS.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] terra_1.7-39

loaded via a namespace (and not attached):
[1] compiler_4.3.1   tools_4.3.1      Rcpp_1.0.11      codetools_0.2-19
mitchellmanware commented 1 year ago

That is a good point. I checked and netCDF has FALSE under "vsi", so I will investigate further. Whats more confusing is that reading the .nc file with terra::rast(), raster::brick(), and ncdf4::nc_open() all return a similar error, even though the same three functions work fine with a local data file.

After seeing this I am not so sure its an issue with terra or something else.

Image

sigmafelix commented 1 year ago

If the code block below returns zero-rows data.frame, the GDAL in your local might miss netCDF driver.

d = terra::gdal(drivers=TRUE)
d[d$name=="netCDF", ]

How about installing a development version of terra from GitHub?

install.packages('terra', repos='https://rspatial.r-universe.dev’)

Source: https://stackoverflow.com/questions/73599324/r-terra-gdal-version-incorrect-cannot-read-nc-gdal-error-4

mitchellmanware commented 1 year ago

I am still unsure what the problem was, but after re-running the 'wget ...' command I am able to load the data with terra::rast(). Maybe we will be able to use terra after all 🤷🏽‍♂️

mitchellmanware commented 1 year ago

Thank you for all the help, Insang. It is much appreciated

sigmafelix commented 1 year ago

Probably there was a transmission issues when downloading the data using wget. I am glad to hear that the issue is resolved!

sigmafelix commented 1 year ago

To everyone: For making a fail-safe data acquisition workflow, I suggest using checksum to validate the downloaded file is the same as the original on the external server. For example, in MERRA2 data, adding .xml to the address of .nc4 file will bring me the metadata prepared by NASA. The number in Files>Checksum>CRC32 is a reference to check whether the transmitted data was corrupted in Linux (or Unix-like) systems using cksum $FILENAME. It took a few seconds to see the checksum result of a MERRA2 file (size=1.1 GB). Comparing checksums would be included in our workflow in the reanalysis phase. A possible issue is that the checksum would not be provided in the data source.