Open mitchellmanware opened 10 months ago
Data processing disrupted by technical difficulties with scientific storage (see OSC email from December 22, 2023).
@Spatiotemporal-Exposures-and-Toxicology
GEOS data includes coordinates, but no information on geographic coordinate systems in a recommended format according to CF convention. We could assume that the coordinates are assigned based on spherical grids as GEOS-CF products have vertical layers. GEOS-CF data conforms to the CF-1 convention (I assume that this basically means the product should be compatible with all conventions after CF-1.0, which was announced in 2003), and there is no spheroidal earth information inside:
[songi2@cn040601/triton ~]$ gdalinfo rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
Driver: netCDF/Network Common Data Format
Files: rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
Size is 512, 512
Metadata:
NC_GLOBAL#Comment=GMAO filename: GEOS-CF_NRT.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
NC_GLOBAL#Contact=http://gmao.gsfc.nasa.gov
NC_GLOBAL#Conventions=CF-1
NC_GLOBAL#DataResolution=0.25 x 0.25
NC_GLOBAL#EasternmostLongitude=179.75
NC_GLOBAL#Filename=GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
NC_GLOBAL#Format=NetCDF-4/HDF-5
NC_GLOBAL#GranuleID=GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
NC_GLOBAL#History=Original file generated: Sun Jun 5 21:15:43 2022 GMT
NC_GLOBAL#Institution=NASA Global Modeling and Assimilation Office
NC_GLOBAL#LatitudeResolution=0.25
NC_GLOBAL#LongitudeResolution=0.25
NC_GLOBAL#LongName=GEOS CF 2d time-averaged air quality concentrations diagnostics
NC_GLOBAL#NorthernmostLatitude=90.0
NC_GLOBAL#ProductionDateTime=Original file generated: Sun Jun 5 21:15:43 2022 GMT
NC_GLOBAL#RangeBeginningDate=2022-06-05
NC_GLOBAL#RangeBeginningTime=04:00:00.000000
NC_GLOBAL#RangeEndingDate=2022-06-05
NC_GLOBAL#RangeEndingTime=05:00:00.000000
NC_GLOBAL#References=http://gmao.gsfc.nasa.gov
NC_GLOBAL#ShortName=CF01Raqc_1hrT_g1440x721_V1
NC_GLOBAL#Source=cak_Icarus-1_0_GCCv12-00-01_v2_004_SLES12 experiment_id: GEOS-CF_NRT
NC_GLOBAL#SouthernmostLatitude=-90.0
NC_GLOBAL#SpatialCoverage=global
NC_GLOBAL#TemporalRange=2018-01-01 - ongoing
NC_GLOBAL#Title=GEOS CF (Composition Forecast)
NC_GLOBAL#VersionID=1
NC_GLOBAL#WesternmostLongitude=-180.0
Subdatasets:
SUBDATASET_1_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":CO
SUBDATASET_1_DESC=[1x1x721x1440] Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
SUBDATASET_2_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":NO2
SUBDATASET_2_DESC=[1x1x721x1440] Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
SUBDATASET_3_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":O3
SUBDATASET_3_DESC=[1x1x721x1440] Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
SUBDATASET_4_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":PM25_RH35_GCC
SUBDATASET_4_DESC=[1x1x721x1440] Particulate_matter_with_diameter_below_2.5_um_RH_35 (32-bit floating-point)
SUBDATASET_5_NAME=NETCDF:"rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4":SO2
SUBDATASET_5_DESC=[1x1x721x1440] Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air (32-bit floating-point)
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 512.0)
Upper Right ( 512.0, 0.0)
Lower Right ( 512.0, 512.0)
Center ( 256.0, 256.0)
ncdump
command (not via R) returned no spheroid/ellipsoid information:
[songi2@cn040601/triton ~]$ ncdump -ct rtest/GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4
netcdf GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z {
dimensions:
lon = 1440 ;
lat = 721 ;
lev = 1 ;
time = UNLIMITED ; // (1 currently)
variables:
double lon(lon) ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
double lat(lat) ;
lat:long_name = "latitude" ;
lat:units = "degrees_north" ;
double lev(lev) ;
lev:long_name = "vertical level" ;
lev:units = "layer" ;
lev:positive = "down" ;
lev:coordinate = "eta" ;
lev:standard_name = "model_layers" ;
int time(time) ;
time:long_name = "time" ;
time:units = "minutes since 2022-06-05 04:30:00" ;
time:time_increment = 10000 ; // "2022-06-12 03:10:0.000000"
time:begin_date = 20220605 ; // "2060-11-14 06:34:60.000000"
time:begin_time = 43000 ; // "2022-07-05 01:10:0.000000"
float CO(time, lev, lat, lon) ;
CO:long_name = "Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air" ;
CO:units = "mol mol-1" ;
CO:_FillValue = 1.e+15f ;
CO:missing_value = 1.e+15f ;
CO:fmissing_value = 1.e+15f ;
CO:scale_factor = 1.f ;
CO:add_offset = 0.f ;
CO:standard_name = "Carbon monoxide (CO, MW = 28.00 g mol-1) volume mixing ratio dry air" ;
CO:vmin = -1.e+15f ;
CO:vmax = 1.e+15f ;
CO:valid_range = -1.e+15f, 1.e+15f ;
float NO2(time, lev, lat, lon) ;
NO2:long_name = "Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air" ;
NO2:units = "mol mol-1" ;
NO2:_FillValue = 1.e+15f ;
NO2:missing_value = 1.e+15f ;
NO2:fmissing_value = 1.e+15f ;
NO2:scale_factor = 1.f ;
NO2:add_offset = 0.f ;
NO2:standard_name = "Nitrogen dioxide (NO2, MW = 46.00 g mol-1) volume mixing ratio dry air" ;
NO2:vmin = -1.e+15f ;
NO2:vmax = 1.e+15f ;
NO2:valid_range = -1.e+15f, 1.e+15f ;
float O3(time, lev, lat, lon) ;
O3:long_name = "Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air" ;
O3:units = "mol mol-1" ;
O3:_FillValue = 1.e+15f ;
O3:missing_value = 1.e+15f ;
O3:fmissing_value = 1.e+15f ;
O3:scale_factor = 1.f ;
O3:add_offset = 0.f ;
O3:standard_name = "Ozone (O3, MW = 48.00 g mol-1) volume mixing ratio dry air" ;
O3:vmin = -1.e+15f ;
O3:vmax = 1.e+15f ;
O3:valid_range = -1.e+15f, 1.e+15f ;
float PM25_RH35_GCC(time, lev, lat, lon) ;
PM25_RH35_GCC:long_name = "Particulate_matter_with_diameter_below_2.5_um_RH_35" ;
PM25_RH35_GCC:units = "ug m-3" ;
PM25_RH35_GCC:_FillValue = 1.e+15f ;
PM25_RH35_GCC:missing_value = 1.e+15f ;
PM25_RH35_GCC:fmissing_value = 1.e+15f ;
PM25_RH35_GCC:scale_factor = 1.f ;
PM25_RH35_GCC:add_offset = 0.f ;
PM25_RH35_GCC:standard_name = "Particulate_matter_with_diameter_below_2.5_um_RH_35" ;
PM25_RH35_GCC:vmin = -1.e+15f ;
PM25_RH35_GCC:vmax = 1.e+15f ;
PM25_RH35_GCC:valid_range = -1.e+15f, 1.e+15f ;
float SO2(time, lev, lat, lon) ;
SO2:long_name = "Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air" ;
SO2:units = "mol mol-1" ;
SO2:_FillValue = 1.e+15f ;
SO2:missing_value = 1.e+15f ;
SO2:fmissing_value = 1.e+15f ;
SO2:scale_factor = 1.f ;
SO2:add_offset = 0.f ;
SO2:standard_name = "Sulfur dioxide (SO2, MW = 64.00 g mol-1) volume mixing ratio dry air" ;
SO2:vmin = -1.e+15f ;
SO2:vmax = 1.e+15f ;
SO2:valid_range = -1.e+15f, 1.e+15f ;
// global attributes:
:Contact = "http://gmao.gsfc.nasa.gov" ;
:History = "Original file generated: Sun Jun 5 21:15:43 2022 GMT" ;
:Comment = "GMAO filename: GEOS-CF_NRT.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
:Filename = "GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
:Source = "cak_Icarus-1_0_GCCv12-00-01_v2_004_SLES12 experiment_id: GEOS-CF_NRT" ;
:Conventions = "CF-1" ;
:Title = "GEOS CF (Composition Forecast)" ;
:Institution = "NASA Global Modeling and Assimilation Office" ;
:References = "http://gmao.gsfc.nasa.gov" ;
:Format = "NetCDF-4/HDF-5" ;
:SpatialCoverage = "global" ;
:VersionID = "1" ;
:TemporalRange = "2018-01-01 - ongoing" ;
:ShortName = "CF01Raqc_1hrT_g1440x721_V1" ;
:RangeBeginningDate = "2022-06-05" ;
:RangeBeginningTime = "04:00:00.000000" ;
:RangeEndingDate = "2022-06-05" ;
:RangeEndingTime = "05:00:00.000000" ;
:GranuleID = "GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20220605_0430z.nc4" ;
:ProductionDateTime = "Original file generated: Sun Jun 5 21:15:43 2022 GMT" ;
:LongName = "GEOS CF 2d time-averaged air quality concentrations diagnostics" ;
:SouthernmostLatitude = "-90.0" ;
:NorthernmostLatitude = "90.0" ;
:WesternmostLongitude = "-180.0" ;
:EasternmostLongitude = "179.75" ;
:LatitudeResolution = "0.25" ;
:LongitudeResolution = "0.25" ;
:DataResolution = "0.25 x 0.25" ;
cf. Cubed-sphere grid information from NASA : link
The c360 cubed-sphere grid system used in GEOS models seems to be based on the radius of 6371007.2 meters according to GEOS-Chem gcpy
package constants:
Despite being an old system, proj.4 definition of GEOS spheroid would be
+proj=latlon +R=6371007.2
terra
accepts proj.4 definition, thus we could use that string in terra::crs()
function.
library(terra)
terra::crs("+proj=latlon +R=6371007.2")
# [1] "GEOGCRS[\"unknown\",\n DATUM[\"unknown\",\n ELLIPSOID[\"unknown\",6371007.2,0,\n # LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]]],\n PRIMEM[\"Greenwich\",0,\n # ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8901]],\n CS[ellipsoidal,2],\n # AXIS[\"longitude\",east,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433,\n # ID[\"EPSG\",9122]]],\n AXIS[\"latitude\",north,\n ORDER[2],\n # ANGLEUNIT[\"degree\",0.0174532925199433,\n ID[\"EPSG\",9122]]]]"
Could you try this the proj.4 string then inspect differences in extracted values from this definition and using EPSG:4326
@mitchellmanware ? Thank you.
@sigmafelix
Will do. Thank you for identifying this.
Extracted values using EPSG:4326
are identical to those using +proj=latlon +R=6371007.2
.
Example used variables CO_lev=72, NO2_lev=72, O3_lev=72, PM25_RH35_GCC_lev=72, and SO2_lev=72 from GEOS-CF aqc_tavg_1hr_g1440x721_v1 collection at US state centroids.
> # GEOS-CF coordinate reference system-specifc extracted values exploratory
> library(terra); library(tigris)
> geos <- rast("GEOS-CF.v01.rpl.aqc_tavg_1hr_g1440x721_v1.20180101_0030z.nc4")
> locations <- centroids(vect(states()))
Retrieving data for the year 2021
> crs_WGS84 <- "EPSG:4326"
> crs_R6371007 <- "+proj=latlon +R=6371007.2"
> extracted_values <- NULL
> for (g in seq_len(nlyr(geos))) {
+ # EPSG 4326
+ layer_WGS84 <- geos[[g]]
+ crs(layer_WGS84) <- crs_WGS84
+ layer_WGS84 <- project(layer_WGS84, crs_WGS84)
+ locations_WGS84 <- project(locations, crs_WGS84)
+ extracted_WGS84 <- extract(layer_WGS84,
+ locations_WGS84,
+ ID = FALSE,
+ bind = FALSE)
+
+ # +proj=latlon +R=6371007.2
+ layer_R6371007 <- geos[[g]]
+ crs(layer_R6371007) <- crs_R6371007
+ layer_R6371007 <- project(layer_R6371007, crs_R6371007)
+ locations_R6371007 <- project(locations, crs_R6371007)
+ extracted_R6371007 <- extract(layer_R6371007,
+ locations_R6371007,
+ ID = FALSE,
+ bind = FALSE)
+
+ # combine
+ extracted_values_layer <- cbind(paste0(names(geos[[g]])),
+ extracted_WGS84,
+ extracted_R6371007)
+ colnames(extracted_values_layer) <- c("layer", "WGS84", "R6371007")
+ extracted_values_layer$difference <-
+ extracted_values_layer$WGS84 - extracted_values_layer$R6371007
+ extracted_values <- rbind(extracted_values,
+ extracted_values_layer)
+ }
> head(extracted_values)
layer WGS84 R6371007 difference
1 CO_lev=72 1.512271e-07 1.512271e-07 0
2 CO_lev=72 2.184570e-07 2.184570e-07 0
3 CO_lev=72 1.461631e-07 1.461631e-07 0
4 CO_lev=72 1.364715e-07 1.364715e-07 0
5 CO_lev=72 2.156630e-07 2.156630e-07 0
6 CO_lev=72 1.445623e-07 1.445623e-07 0
> tail(extracted_values)
layer WGS84 R6371007 difference
275 SO2_lev=72 7.171366e-10 7.171366e-10 0
276 SO2_lev=72 6.939445e-10 6.939445e-10 0
277 SO2_lev=72 1.833087e-09 1.833087e-09 0
278 SO2_lev=72 4.602043e-09 4.602043e-09 0
279 SO2_lev=72 8.002417e-10 8.002417e-10 0
280 SO2_lev=72 3.118430e-10 3.118430e-10 0
> summary(extracted_values)
layer WGS84 R6371007 difference
Length:280 Min. : 0.00 Min. : 0.00 Min. :0
Class :character 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0
Mode :character Median : 0.00 Median : 0.00 Median :0
Mean : 1.78 Mean : 1.78 Mean :0
3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.:0
Max. :36.40 Max. :36.40 Max. :0
> identical(extracted_values$WGS84, extracted_values$R6371007)
[1] TRUE
@mitchellmanware Thank you for checking it. How about using circular buffers? 0.25x0.25 degrees are quite coarse, so point extracts would be the same.
Results remain identical when extracting at 25, 50, 75, and 100 km buffers
> buffers <- c(25000, 50000, 75000, 100000)
> extracted_values <- NULL
> for (g in seq_len(nlyr(geos))) {
+ for (b in seq_along(buffers)) {
+ # EPSG 4326
+ layer_WGS84 <- geos[[g]]
+ crs(layer_WGS84) <- crs_WGS84
+ layer_WGS84 <- project(layer_WGS84, crs_WGS84)
+ locations_WGS84 <- project(locations, crs_WGS84)
+ extracted_WGS84 <- extract(layer_WGS84,
+ buffer(locations_WGS84,
+ buffers[b]),
+ fun = "mean",
+ ID = FALSE,
+ bind = FALSE)
+
+ # +proj=latlon +R=6371007.2
+ layer_R6371007 <- geos[[g]]
+ crs(layer_R6371007) <- crs_R6371007
+ layer_R6371007 <- project(layer_R6371007, crs_R6371007)
+ locations_R6371007 <- project(locations, crs_R6371007)
+ extracted_R6371007 <- extract(layer_R6371007,
+ buffer(locations_R6371007,
+ buffers[b]),
+ fun = "mean",
+ ID = FALSE,
+ bind = FALSE)
+
+ # combine
+ extracted_values_layer <- cbind(paste0(names(geos[[g]])),
+ paste0(buffers[b], "m buffer"),
+ extracted_WGS84,
+ extracted_R6371007)
+ colnames(extracted_values_layer) <- c("layer",
+ "buffer (m)",
+ "WGS84",
+ "R6371007")
+ extracted_values_layer$difference <-
+ extracted_values_layer$WGS84 - extracted_values_layer$R6371007
+ extracted_values <- rbind(extracted_values,
+ extracted_values_layer)
+ }
+ }
> head(extracted_values)
layer buffer (m) WGS84 R6371007 difference
1 CO_lev=72 25000m buffer 1.514406e-07 1.514406e-07 0
2 CO_lev=72 25000m buffer 2.189420e-07 2.189420e-07 0
3 CO_lev=72 25000m buffer 1.463765e-07 1.463765e-07 0
4 CO_lev=72 25000m buffer 1.354965e-07 1.354965e-07 0
5 CO_lev=72 25000m buffer 2.225509e-07 2.225509e-07 0
6 CO_lev=72 25000m buffer 1.450280e-07 1.450280e-07 0
> tail(extracted_values)
layer buffer (m) WGS84 R6371007 difference
1115 SO2_lev=72 1e+05m buffer 5.648399e-10 5.648399e-10 0
1116 SO2_lev=72 1e+05m buffer 9.110295e-10 9.110295e-10 0
1117 SO2_lev=72 1e+05m buffer 1.511641e-09 1.511641e-09 0
1118 SO2_lev=72 1e+05m buffer 2.437729e-09 2.437729e-09 0
1119 SO2_lev=72 1e+05m buffer 4.415059e-10 4.415059e-10 0
1120 SO2_lev=72 1e+05m buffer 3.108651e-10 3.108651e-10 0
> summary(extracted_values)
layer buffer (m) WGS84 R6371007 difference
Length:1120 Length:1120 Min. : 0.000 Min. : 0.000 Min. :0
Class :character Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0
Mode :character Mode :character Median : 0.000 Median : 0.000 Median :0
Mean : 1.777 Mean : 1.777 Mean :0
3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.:0
Max. :36.476 Max. :36.476 Max. :0
> identical(extracted_values$WGS84, extracted_values$R6371007)
[1] TRUE
@mitchellmanware Nice! We could use EPSG:4326 for GEOS-CF covariate calculation from now on.
Great, thank you again for such a deep dive into the GEOS-CF crs details.
The GEOS-CF extraction function has been created so I will submit jobs on geo cluster once I receive new GFE.
@sigmafelix @mitchellmanware It looks like y'all resolved the CRS issues for the GEOS-CF data. Thanks!
GEOS-CF covariates calculated for all AQS monitoring sites. Data available in /ddn/.../NRT-AP-Model/output/NRTAP_Covar_GEOS.rds
.
Raw data copied to /ddn/...NRT-AP-Model/input/geos/
.
1) Unique coordinate reference system 2) @mitchellmanware will extract from end product 3) @sigmafelix pause on MERRA2 #210