NIEHS / beethoven

BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
https://niehs.github.io/beethoven/
Other
4 stars 0 forks source link

Download and process GEOS-CF data #198

Open mitchellmanware opened 10 months ago

kyle-messier commented 10 months ago

1) Unique coordinate reference system 2) @mitchellmanware will extract from end product 3) @sigmafelix pause on MERRA2 #210

mitchellmanware commented 10 months ago

Data processing disrupted by technical difficulties with scientific storage (see OSC email from December 22, 2023).

sigmafelix commented 9 months ago

@Spatiotemporal-Exposures-and-Toxicology

GEOS data includes coordinates, but no information on geographic coordinate systems in a recommended format according to CF convention. We could assume that the coordinates are assigned based on spherical grids as GEOS-CF products have vertical layers. GEOS-CF data conforms to the CF-1 convention (I assume that this basically means the product should be compatible with all conventions after CF-1.0, which was announced in 2003), and there is no spheroidal earth information inside:

[songi2@cn040601/triton ~]$ gdalinfo rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
Driver: netCDF/Network Common Data Format
Files: rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
Size is 512, 512
Metadata:
  NC_GLOBAL#Comment=GMAO filename: GEOS-CF_NRT.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
  NC_GLOBAL#Contact=http://gmao.gsfc.nasa.gov
  NC_GLOBAL#Conventions=CF-1
  NC_GLOBAL#DataResolution=0.25 x 0.25
  NC_GLOBAL#EasternmostLongitude=179.75
  NC_GLOBAL#Filename=GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
  NC_GLOBAL#Format=NetCDF-4/HDF-5
  NC_GLOBAL#GranuleID=GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
  NC_GLOBAL#History=Original file generated: Sun Jun  5 21:15:43 2022 GMT
  NC_GLOBAL#Institution=NASA Global Modeling and Assimilation Office
  NC_GLOBAL#LatitudeResolution=0.25
  NC_GLOBAL#LongitudeResolution=0.25
  NC_GLOBAL#LongName=GEOS CF 2d time-averaged air quality concentrations diagnostics
  NC_GLOBAL#NorthernmostLatitude=90.0
  NC_GLOBAL#ProductionDateTime=Original file generated: Sun Jun  5 21:15:43 2022 GMT
  NC_GLOBAL#RangeBeginningDate=2022-06-05
  NC_GLOBAL#RangeBeginningTime=04:00:00.000000
  NC_GLOBAL#RangeEndingDate=2022-06-05
  NC_GLOBAL#RangeEndingTime=05:00:00.000000
  NC_GLOBAL#References=http://gmao.gsfc.nasa.gov
  NC_GLOBAL#ShortName=CF01Raqc_1hrT_g1440x721_V1
  NC_GLOBAL#Source=cak_Icarus-1_0_GCCv12-00-01_v2_004_SLES12 experiment_id: GEOS-CF_NRT
  NC_GLOBAL#SouthernmostLatitude=-90.0
  NC_GLOBAL#SpatialCoverage=global
  NC_GLOBAL#TemporalRange=2018-01-01 - ongoing
  NC_GLOBAL#Title=GEOS CF (Composition Forecast)
  NC_GLOBAL#VersionID=1
  NC_GLOBAL#WesternmostLongitude=-180.0
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":CO
  SUBDATASET_1_DESC=[1x1x721x1440] Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
  SUBDATASET_2_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":NO2
  SUBDATASET_2_DESC=[1x1x721x1440] Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
  SUBDATASET_3_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":O3
  SUBDATASET_3_DESC=[1x1x721x1440] Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
  SUBDATASET_4_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":PM25_RH35_GCC
  SUBDATASET_4_DESC=[1x1x721x1440] Particulate_matter_with_diameter_below_2.5_um_RH_35 (32-bit floating-point)
  SUBDATASET_5_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":SO2
  SUBDATASET_5_DESC=[1x1x721x1440] Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

ncdump command (not via R) returned no spheroid/ellipsoid information:

[songi2@cn040601/triton ~]$ ncdump -ct rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
netcdf GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z {
dimensions:
    lon = 1440 ;
    lat = 721 ;
    lev = 1 ;
    time = UNLIMITED ; // (1 currently)
variables:
    double lon(lon) ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
    double lat(lat) ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
    double lev(lev) ;
        lev:long_name = "vertical level" ;
        lev:units = "layer" ;
        lev:positive = "down" ;
        lev:coordinate = "eta" ;
        lev:standard_name = "model_layers" ;
    int time(time) ;
        time:long_name = "time" ;
        time:units = "minutes since 2022-06-05 04:30:00" ;
        time:time_increment = 10000 ; // "2022-06-12 03:10:0.000000"
        time:begin_date = 20220605 ; // "2060-11-14 06:34:60.000000"
        time:begin_time = 43000 ; // "2022-07-05 01:10:0.000000"
    float CO(time, lev, lat, lon) ;
        CO:long_name = "Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air" ;
        CO:units = "mol mol-1" ;
        CO:_FillValue = 1.e+15f ;
        CO:missing_value = 1.e+15f ;
        CO:fmissing_value = 1.e+15f ;
        CO:scale_factor = 1.f ;
        CO:add_offset = 0.f ;
        CO:standard_name = "Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air" ;
        CO:vmin = -1.e+15f ;
        CO:vmax = 1.e+15f ;
        CO:valid_range = -1.e+15f, 1.e+15f ;
    float NO2(time, lev, lat, lon) ;
        NO2:long_name = "Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air" ;
        NO2:units = "mol mol-1" ;
        NO2:_FillValue = 1.e+15f ;
        NO2:missing_value = 1.e+15f ;
        NO2:fmissing_value = 1.e+15f ;
        NO2:scale_factor = 1.f ;
        NO2:add_offset = 0.f ;
        NO2:standard_name = "Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air" ;
        NO2:vmin = -1.e+15f ;
        NO2:vmax = 1.e+15f ;
        NO2:valid_range = -1.e+15f, 1.e+15f ;
    float O3(time, lev, lat, lon) ;
        O3:long_name = "Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air" ;
        O3:units = "mol mol-1" ;
        O3:_FillValue = 1.e+15f ;
        O3:missing_value = 1.e+15f ;
        O3:fmissing_value = 1.e+15f ;
        O3:scale_factor = 1.f ;
        O3:add_offset = 0.f ;
        O3:standard_name = "Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air" ;
        O3:vmin = -1.e+15f ;
        O3:vmax = 1.e+15f ;
        O3:valid_range = -1.e+15f, 1.e+15f ;
    float PM25_RH35_GCC(time, lev, lat, lon) ;
        PM25_RH35_GCC:long_name = "Particulate_matter_with_diameter_below_2.5_um_RH_35" ;
        PM25_RH35_GCC:units = "ug m-3" ;
        PM25_RH35_GCC:_FillValue = 1.e+15f ;
        PM25_RH35_GCC:missing_value = 1.e+15f ;
        PM25_RH35_GCC:fmissing_value = 1.e+15f ;
        PM25_RH35_GCC:scale_factor = 1.f ;
        PM25_RH35_GCC:add_offset = 0.f ;
        PM25_RH35_GCC:standard_name = "Particulate_matter_with_diameter_below_2.5_um_RH_35" ;
        PM25_RH35_GCC:vmin = -1.e+15f ;
        PM25_RH35_GCC:vmax = 1.e+15f ;
        PM25_RH35_GCC:valid_range = -1.e+15f, 1.e+15f ;
    float SO2(time, lev, lat, lon) ;
        SO2:long_name = "Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air" ;
        SO2:units = "mol mol-1" ;
        SO2:_FillValue = 1.e+15f ;
        SO2:missing_value = 1.e+15f ;
        SO2:fmissing_value = 1.e+15f ;
        SO2:scale_factor = 1.f ;
        SO2:add_offset = 0.f ;
        SO2:standard_name = "Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air" ;
        SO2:vmin = -1.e+15f ;
        SO2:vmax = 1.e+15f ;
        SO2:valid_range = -1.e+15f, 1.e+15f ;

// global attributes:
        :Contact = "http://gmao.gsfc.nasa.gov" ;
        :History = "Original file generated: Sun Jun  5 21:15:43 2022 GMT" ;
        :Comment = "GMAO filename: GEOS-CF_NRT.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
        :Filename = "GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
        :Source = "cak_Icarus-1_0_GCCv12-00-01_v2_004_SLES12 experiment_id: GEOS-CF_NRT" ;
        :Conventions = "CF-1" ;
        :Title = "GEOS CF (Composition Forecast)" ;
        :Institution = "NASA Global Modeling and Assimilation Office" ;
        :References = "http://gmao.gsfc.nasa.gov" ;
        :Format = "NetCDF-4/HDF-5" ;
        :SpatialCoverage = "global" ;
        :VersionID = "1" ;
        :TemporalRange = "2018-01-01 - ongoing" ;
        :ShortName = "CF01Raqc_1hrT_g1440x721_V1" ;
        :RangeBeginningDate = "2022-06-05" ;
        :RangeBeginningTime = "04:00:00.000000" ;
        :RangeEndingDate = "2022-06-05" ;
        :RangeEndingTime = "05:00:00.000000" ;
        :GranuleID = "GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
        :ProductionDateTime = "Original file generated: Sun Jun  5 21:15:43 2022 GMT" ;
        :LongName = "GEOS CF 2d time-averaged air quality concentrations diagnostics" ;
        :SouthernmostLatitude = "-90.0" ;
        :NorthernmostLatitude = "90.0" ;
        :WesternmostLongitude = "-180.0" ;
        :EasternmostLongitude = "179.75" ;
        :LatitudeResolution = "0.25" ;
        :LongitudeResolution = "0.25" ;
        :DataResolution = "0.25 x 0.25" ;

cf. Cubed-sphere grid information from NASA : link

sigmafelix commented 9 months ago

The c360 cubed-sphere grid system used in GEOS models seems to be based on the radius of 6371007.2 meters according to GEOS-Chem gcpy package constants:

https://github.com/geoschem/gcpy/blob/a056352a4b9db86cec8a77e7a5634f50bf6e1c3c/gcpy/constants.py#L19-L22

Despite being an old system, proj.4 definition of GEOS spheroid would be

+proj=latlon +R=6371007.2

terra accepts proj.4 definition, thus we could use that string in terra::crs() function.

library(terra)
terra::crs("+proj=latlon +R=6371007.2")

# [1] "GEOGCRS[\"unknown\",\n    DATUM[\"unknown\",\n        ELLIPSOID[\"unknown\",6371007.2,0,\n            # LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]],\n    PRIMEM[\"Greenwich\",0,\n        # ANGLEUNIT[\"degree\",0.0174532925199433],\n        ID[\"EPSG\",8901]],\n    CS[ellipsoidal,2],\n        # AXIS[\"longitude\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                # ID[\"EPSG\",9122]]],\n        AXIS[\"latitude\",north,\n            ORDER[2],\n            # ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]]]"

Could you try this the proj.4 string then inspect differences in extracted values from this definition and using EPSG:4326 @mitchellmanware ? Thank you.

mitchellmanware commented 9 months ago

@sigmafelix

Will do. Thank you for identifying this.

mitchellmanware commented 9 months ago

Extracted values using EPSG:4326 are identical to those using +proj=latlon +R=6371007.2.

Example used variables CO_lev=72, NO2_lev=72, O3_lev=72, PM25_RH35_GCC_lev=72, and SO2_lev=72 from GEOS-CF aqc_tavg_1hr_g1440x721_v1 collection at US state centroids.

> # GEOS-CF coordinate reference system-specifc extracted values exploratory
> library(terra); library(tigris)
> geos <- rast("GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20180101_0030z.nc4")
> locations <- centroids(vect(states()))
Retrieving data for the year 2021
> crs_WGS84 <- "EPSG:4326"
> crs_R6371007 <- "+proj=latlon +R=6371007.2"
> extracted_values <- NULL
> for (g in seq_len(nlyr(geos))) {
+   # EPSG 4326
+   layer_WGS84 <- geos[[g]]
+   crs(layer_WGS84) <- crs_WGS84
+   layer_WGS84 <- project(layer_WGS84, crs_WGS84)
+   locations_WGS84 <- project(locations, crs_WGS84)
+   extracted_WGS84 <- extract(layer_WGS84,
+                              locations_WGS84,
+                              ID = FALSE,
+                              bind = FALSE)
+   
+   # +proj=latlon +R=6371007.2
+   layer_R6371007 <- geos[[g]]
+   crs(layer_R6371007) <- crs_R6371007
+   layer_R6371007 <- project(layer_R6371007, crs_R6371007)
+   locations_R6371007 <- project(locations, crs_R6371007)
+   extracted_R6371007 <- extract(layer_R6371007,
+                                 locations_R6371007,
+                                 ID = FALSE,
+                                 bind = FALSE)
+   
+   # combine
+   extracted_values_layer <- cbind(paste0(names(geos[[g]])),
+                                   extracted_WGS84,
+                                   extracted_R6371007)
+   colnames(extracted_values_layer) <- c("layer", "WGS84", "R6371007")
+   extracted_values_layer$difference <- 
+     extracted_values_layer$WGS84 - extracted_values_layer$R6371007
+   extracted_values <- rbind(extracted_values,
+                             extracted_values_layer)
+ }
> head(extracted_values)
      layer        WGS84     R6371007 difference
1 CO_lev=72 1.512271e-07 1.512271e-07          0
2 CO_lev=72 2.184570e-07 2.184570e-07          0
3 CO_lev=72 1.461631e-07 1.461631e-07          0
4 CO_lev=72 1.364715e-07 1.364715e-07          0
5 CO_lev=72 2.156630e-07 2.156630e-07          0
6 CO_lev=72 1.445623e-07 1.445623e-07          0
> tail(extracted_values)
         layer        WGS84     R6371007 difference
275 SO2_lev=72 7.171366e-10 7.171366e-10          0
276 SO2_lev=72 6.939445e-10 6.939445e-10          0
277 SO2_lev=72 1.833087e-09 1.833087e-09          0
278 SO2_lev=72 4.602043e-09 4.602043e-09          0
279 SO2_lev=72 8.002417e-10 8.002417e-10          0
280 SO2_lev=72 3.118430e-10 3.118430e-10          0
> summary(extracted_values)
    layer               WGS84          R6371007       difference
 Length:280         Min.   : 0.00   Min.   : 0.00   Min.   :0   
 Class :character   1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.:0   
 Mode  :character   Median : 0.00   Median : 0.00   Median :0   
                    Mean   : 1.78   Mean   : 1.78   Mean   :0   
                    3rd Qu.: 0.00   3rd Qu.: 0.00   3rd Qu.:0   
                    Max.   :36.40   Max.   :36.40   Max.   :0   
> identical(extracted_values$WGS84, extracted_values$R6371007)
[1] TRUE
sigmafelix commented 9 months ago

@mitchellmanware Thank you for checking it. How about using circular buffers? 0.25x0.25 degrees are quite coarse, so point extracts would be the same.

mitchellmanware commented 9 months ago

Results remain identical when extracting at 25, 50, 75, and 100 km buffers

> buffers <- c(25000, 50000, 75000, 100000)
> extracted_values <- NULL
> for (g in seq_len(nlyr(geos))) {
+   for (b in seq_along(buffers)) {
+     # EPSG 4326
+     layer_WGS84 <- geos[[g]]
+     crs(layer_WGS84) <- crs_WGS84
+     layer_WGS84 <- project(layer_WGS84, crs_WGS84)
+     locations_WGS84 <- project(locations, crs_WGS84)
+     extracted_WGS84 <- extract(layer_WGS84,
+                                buffer(locations_WGS84,
+                                       buffers[b]),
+                                fun = "mean",
+                                ID = FALSE,
+                                bind = FALSE)
+     
+     # +proj=latlon +R=6371007.2
+     layer_R6371007 <- geos[[g]]
+     crs(layer_R6371007) <- crs_R6371007
+     layer_R6371007 <- project(layer_R6371007, crs_R6371007)
+     locations_R6371007 <- project(locations, crs_R6371007)
+     extracted_R6371007 <- extract(layer_R6371007,
+                                   buffer(locations_R6371007,
+                                          buffers[b]),
+                                   fun = "mean",
+                                   ID = FALSE,
+                                   bind = FALSE)
+     
+     # combine
+     extracted_values_layer <- cbind(paste0(names(geos[[g]])),
+                                     paste0(buffers[b], "m buffer"),
+                                     extracted_WGS84,
+                                     extracted_R6371007)
+     colnames(extracted_values_layer) <- c("layer",
+                                           "buffer (m)",
+                                           "WGS84",
+                                           "R6371007")
+     extracted_values_layer$difference <- 
+       extracted_values_layer$WGS84 - extracted_values_layer$R6371007
+     extracted_values <- rbind(extracted_values,
+                               extracted_values_layer)
+   }
+ }
> head(extracted_values)
      layer    buffer (m)        WGS84     R6371007 difference
1 CO_lev=72 25000m buffer 1.514406e-07 1.514406e-07          0
2 CO_lev=72 25000m buffer 2.189420e-07 2.189420e-07          0
3 CO_lev=72 25000m buffer 1.463765e-07 1.463765e-07          0
4 CO_lev=72 25000m buffer 1.354965e-07 1.354965e-07          0
5 CO_lev=72 25000m buffer 2.225509e-07 2.225509e-07          0
6 CO_lev=72 25000m buffer 1.450280e-07 1.450280e-07          0
> tail(extracted_values)
          layer    buffer (m)        WGS84     R6371007 difference
1115 SO2_lev=72 1e+05m buffer 5.648399e-10 5.648399e-10          0
1116 SO2_lev=72 1e+05m buffer 9.110295e-10 9.110295e-10          0
1117 SO2_lev=72 1e+05m buffer 1.511641e-09 1.511641e-09          0
1118 SO2_lev=72 1e+05m buffer 2.437729e-09 2.437729e-09          0
1119 SO2_lev=72 1e+05m buffer 4.415059e-10 4.415059e-10          0
1120 SO2_lev=72 1e+05m buffer 3.108651e-10 3.108651e-10          0
> summary(extracted_values)
    layer            buffer (m)            WGS84           R6371007        difference
 Length:1120        Length:1120        Min.   : 0.000   Min.   : 0.000   Min.   :0   
 Class :character   Class :character   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:0   
 Mode  :character   Mode  :character   Median : 0.000   Median : 0.000   Median :0   
                                       Mean   : 1.777   Mean   : 1.777   Mean   :0   
                                       3rd Qu.: 0.000   3rd Qu.: 0.000   3rd Qu.:0   
                                       Max.   :36.476   Max.   :36.476   Max.   :0   
> identical(extracted_values$WGS84, extracted_values$R6371007)
[1] TRUE
sigmafelix commented 9 months ago

@mitchellmanware Nice! We could use EPSG:4326 for GEOS-CF covariate calculation from now on.

mitchellmanware commented 9 months ago

Great, thank you again for such a deep dive into the GEOS-CF crs details.

The GEOS-CF extraction function has been created so I will submit jobs on geo cluster once I receive new GFE.

kyle-messier commented 9 months ago

@sigmafelix @mitchellmanware It looks like y'all resolved the CRS issues for the GEOS-CF data. Thanks!

mitchellmanware commented 9 months ago

GEOS-CF covariates calculated for all AQS monitoring sites. Data available in /ddn/.../NRT-AP-Model/output/NRTAP_Covar_GEOS.rds.

mitchellmanware commented 9 months ago

Raw data copied to /ddn/...NRT-AP-Model/input/geos/.