Open kyle-messier opened 6 months ago
I think the easiest and the most relatable spatial range is Durham County, NC, which can be queried from the bundled file in sf
package by system.file("shape/nc.shp", package = "sf")
(reprojection required). The county has the total area of 772 sq km, which give 772 points given that we use 1 km spatial resolution) and include a substantial part of RTP area (following what Eva proposed).
Thank you Insang, I will use this spatial range to create the testdata.
I will utilize the Durham County, NC area for test data associated with the NCEP NARR and GEOS-CF data sources.
A full data set for the "pressure levels" NCEP NARR data may have to be included due to the layer naming structure used in the netCDF file. The full netCDF does not indicate each layer's "pressure level" other than in the layer names, which is lost when trying to re-write as .nc
. I will continue to explore this further.
@eva0marques
Do you plan to/have you already created a subset of monitor sites within Durham County, NC for use in testing? If not I will create one.
Sorry for the absence of answer, I completely cut my use of computer during the holidays. I will start working on this task these days. The idea is to create a vignette to extract Durham data for each covariates. For the future pipeline steps I take into consideration your suggestion @mitchellmanware to also extract monitors within Durham county (or create fake ones if there is no existing monitors there).
Testdata will be created at each step of the pipeline to test the following step.
Testdata content:
Note: the resolution of some covariates is very low. Selecting only Durham county means that we only test data on 1 pixel for some of the covariates (eg: NARR).
@eva0marques
Do you encounter any errors or warnings when reading the test data sets into R? When I was playing with the test data sets the 1 x 1 dimensions resulted in extent-related errors.
@mitchellmanware Could result in identifiability issues (e.g. if you regress Y on X and X only has 1 value).
@Spatiotemporal-Exposures-and-Toxicology @eva0marques
Yes, I thought the same thing. Maybe we should expand the area to Durham County and a few of its neighbors?
@mitchellmanware I don't get your problem, can you give me more details? I can open NARR data properly and when I extract Durham county I only have one pixel yes. I will check if the whole county is covered by the pixel.
For me, running the following code caused an extent error due which may have been due to the dimensions being 1x1. The same error did not appear when I ran the code this time.
> # north carolina
> nc <- tigris::counties(state = "North Carolina")
Retrieving data for the year 2021
Using FIPS code '37' for state 'North Carolina'
> # durham
> du <- nc[nc$NAME == "Durham",]
> # durham as terra vect
> du_v <- vect(du)
> # narr air.sfc
> a <- rast("../../data/narr/air.sfc/air.sfc.2018.nc")
> # crop
> a_crop <- crop(
+ a,
+ project(du_v, crs(a))
+ )
> # plot
> terra::plot(a_crop$air_1)
> # save
> writeCDF(
+ a_crop,
+ filename = "./air.sfc.2018.CROP.nc",
+ varname = "air"
+ )
> # reread
> a_crop_reread <- rast("./air.sfc.2018.CROP.nc")
> a_crop_reread
class : SpatRaster
dimensions : 1, 1, 365 (nrow, ncol, nlyr)
resolution : 32462.99, 32463 (x, y)
extent : 8164442, 8196905, 3522236, 3554698 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs
source : air.sfc.2018.CROP.nc
varname : air
names : air_1, air_2, air_3, air_4, air_5, air_6, ...
time : 2018-01-01 to 2018-12-31 UTC
>
Ok, I plotted Durham shp on it and it is true that we are missing areas. But what I need to do is just correct the extraction to get all the necessary tiles. Is there any other problem I don't get?
Just need to add the argument snap = "out" to the crop funciton.
@eva0marques
Per discussion with @Spatiotemporal-Exposures-and-Toxicology, we could also use Durham and Wake counties as the test area. It ensures that we have dimensions greater than 1x1 and could be good to include multiple monitor locations in the testing.
@eva0marques
Also, the omega
NARR variable was causing some problems because the "pressure level" data was only stored in the layer names of the data.
> o <- terra::rast(
+ paste0(dir, "data/narr/omega/omega.201801.nc")
+ )
> names(o)[1:20]
[1] "omega_level=1000_1" "omega_level=975_1" "omega_level=950_1"
[4] "omega_level=925_1" "omega_level=900_1" "omega_level=875_1"
[7] "omega_level=850_1" "omega_level=825_1" "omega_level=800_1"
[10] "omega_level=775_1" "omega_level=750_1" "omega_level=725_1"
[13] "omega_level=700_1" "omega_level=650_1" "omega_level=600_1"
[16] "omega_level=550_1" "omega_level=500_1" "omega_level=450_1"
[19] "omega_level=400_1" "omega_level=350_1"
When the data was subset to NC and written as a netCDF, the pressure level data was lost. One way to overcome this is by selecting only the level=1000
layers, and writing the variable as varname = "omega_level=1000
. I will look for other ways to retain the pressure levels information and use it to write the variable names, but this should work in the interim for our test data set. Suggestion below.
> o <- terra::rast(
+ paste0(dir, "data/narr/omega/omega.201801.nc")
+ )
> o1000 <- terra::subset(
+ o,
+ subset = grep(
+ paste0(
+ "level=1000"
+ ),
+ names(o)
+ )
+ )
> nc <- tigris::counties(state = "NC")
Retrieving data for the year 2021
Using FIPS code '37' for state 'NC'
> nc_v <- terra::vect(nc)
> nc_v <- nc_v[nc_v$NAME %in% c("Durham", "Wake"),]
> nc_p <- terra::project(nc_v, terra::crs(o))
> o1000_crop <- terra::crop(
+ o1000,
+ nc_p,
+ snap = "out"
+ )
> terra::writeCDF(
+ o1000_crop,
+ paste0(dir, "/tests/testdata/omega/omega.201801.CROP.nc"),
+ varname = "omega_level=1000"
+ )
> o_reread <- terra::rast(
+ paste0(dir, "/tests/testdata/omega/omega.201801.CROP.nc")
+ )
> names(o_reread)
[1] "omega_level=1000_1" "omega_level=1000_2" "omega_level=1000_3"
[4] "omega_level=1000_4" "omega_level=1000_5" "omega_level=1000_6"
[7] "omega_level=1000_7" "omega_level=1000_8" "omega_level=1000_9"
[10] "omega_level=1000_10" "omega_level=1000_11" "omega_level=1000_12"
[13] "omega_level=1000_13" "omega_level=1000_14" "omega_level=1000_15"
[16] "omega_level=1000_16" "omega_level=1000_17" "omega_level=1000_18"
[19] "omega_level=1000_19" "omega_level=1000_20" "omega_level=1000_21"
[22] "omega_level=1000_22" "omega_level=1000_23" "omega_level=1000_24"
[25] "omega_level=1000_25" "omega_level=1000_26" "omega_level=1000_27"
[28] "omega_level=1000_28" "omega_level=1000_29" "omega_level=1000_30"
[31] "omega_level=1000_31"
@mitchellmanware
@mitchellmanware
In the end I just extract dates like this, using the time function in terra.
o <- terra::rast(paste0(dir, "data/narr/omega/omega.201801.nc"))
o <- terra::rast(file_path)
o[[terra::time(o) == as.Date("2022-06-01")]]
This code has successfully created small spatial resolution test data set for GEOS chm and aqc collections. They include only tested variable (ACET and CO, respectively) for lightweight testing. Were successful for testing.
## in folder with one day's worth of GEOS aqc files
files <- list.files()
dw <- terra::vect("../durham_wake.shp")
dw_p <- terra::project(dw, "EPSG:4326")
for (f in seq_along(files)) {
data <- terra::rast(files[f], subds = "CO")
terra::crs(data) <- "EPSG:4326"
data_p <- terra::project(data, "EPSG:4326")
data_crop <- terra::crop(data_p,
dw_p,
snap = "out")
writeCDF(data_crop,
paste0("../aqc/", files[f]),
varname = "CO")
}
## in folder with one day's worth of GEOS chm files
files <- list.files()
dw <- terra::vect("../durham_wake.shp")
dw_p <- terra::project(dw, "EPSG:4326")
for (f in seq_along(files)) {
data <- terra::rast(files[f], subds = "ACET")
terra::crs(data) <- "EPSG:4326"
data_p <- terra::project(data, "EPSG:4326")
data_crop <- terra::crop(data_p,
dw_p,
snap = "out")
writeCDF(data_crop,
paste0("../chm/", files[f]),
varname = "ACET")
}
Ok, thank you for your help! So you do not use the other netcdf metadata (units etc). Also, do we admit that it is not important to have all the covariates in the testdata folder? It may cause issues in the future unit testing of the pipeline but maybe we can start like this and complete the folder once needed.
Multiple layers can be written by one command with setting arguments:
library(terra)
#> terra 1.7.67
moddir <-
file.path("/ddn/gs1/group/set",
"Projects",
"NRT-AP-Model",
"input",
"modis",
"raw",
"61",
"MOD09GA")
modfiles <- list.files(moddir, "*.hdf", full.names = TRUE, recursive = TRUE)
# when split=TRUE
writeCDF(
mod11_1,
"~/writetest.nc",
varname = names(mod11_1),
overwrite = TRUE,
unit = units(mod11_1),
split = TRUE)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'writeCDF': object 'mod11_1' not found
modtest <- rast("~/writetest.nc")
modtest
#> class : SpatRaster
#> dimensions : 2400, 2400, 12 (nrow, ncol, nlyr)
#> resolution : 463.3127, 463.3127 (x, y)
#> extent : -8895604, -7783654, 3335852, 4447802 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
#> sources : writetest.nc:num_observations_500m
#> writetest.nc:sur_refl_b01_1
#> writetest.nc:sur_refl_b02_1
#> ... and 9 more source(s)
#> varnames : num_observations_500m
#> sur_refl_b01_1
#> sur_refl_b02_1
#> ...
#> names : num_o~_500m, sur_r~b01_1, sur_r~b02_1, sur_r~b03_1, sur_r~b04_1, sur_r~b05_1, ...
modtestsub <- rast("~/writetest.nc", subds = 2)
modtestsub
#> class : SpatRaster
#> dimensions : 2400, 2400, 1 (nrow, ncol, nlyr)
#> resolution : 463.3127, 463.3127 (x, y)
#> extent : -8895604, -7783654, 3335852, 4447802 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
#> source : writetest.nc:sur_refl_b01_1
#> varname : sur_refl_b01_1
#> name : sur_refl_b01_1
# print by subdatasets
describe("/ddn/gs1/home/songi2/writetest.nc", sds = TRUE)
#> id name
#> 1 1 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":num_observations_500m
#> 2 2 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b01_1
#> 3 3 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b02_1
#> 4 4 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b03_1
#> 5 5 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b04_1
#> 6 6 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b05_1
#> 7 7 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b06_1
#> 8 8 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":sur_refl_b07_1
#> 9 9 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":QC_500m_1
#> 10 10 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":obscov_500m_1
#> 11 11 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":iobs_res_1
#> 12 12 NETCDF:"/ddn/gs1/home/songi2/writetest.nc":q_scan_1
#> var
#> 1 num_observations_500m
#> 2 sur_refl_b01_1
#> 3 sur_refl_b02_1
#> 4 sur_refl_b03_1
#> 5 sur_refl_b04_1
#> 6 sur_refl_b05_1
#> 7 sur_refl_b06_1
#> 8 sur_refl_b07_1
#> 9 QC_500m_1
#> 10 obscov_500m_1
#> 11 iobs_res_1
#> 12 q_scan_1
#> desc nrow ncol nlyr
#> 1 [2400x2400] num_observations_500m (32-bit floating-point) 2400 2400 1
#> 2 [2400x2400] sur_refl_b01_1 (32-bit floating-point) 2400 2400 1
#> 3 [2400x2400] sur_refl_b02_1 (32-bit floating-point) 2400 2400 1
#> 4 [2400x2400] sur_refl_b03_1 (32-bit floating-point) 2400 2400 1
#> 5 [2400x2400] sur_refl_b04_1 (32-bit floating-point) 2400 2400 1
#> 6 [2400x2400] sur_refl_b05_1 (32-bit floating-point) 2400 2400 1
#> 7 [2400x2400] sur_refl_b06_1 (32-bit floating-point) 2400 2400 1
#> 8 [2400x2400] sur_refl_b07_1 (32-bit floating-point) 2400 2400 1
#> 9 [2400x2400] QC_500m_1 (32-bit floating-point) 2400 2400 1
#> 10 [2400x2400] obscov_500m_1 (32-bit floating-point) 2400 2400 1
#> 11 [2400x2400] iobs_res_1 (32-bit floating-point) 2400 2400 1
#> 12 [2400x2400] q_scan_1 (32-bit floating-point) 2400 2400 1
# when split=FALSE
writeCDF(
mod11_1,
"~/writetest_nosplit.nc",
varname = names(mod11_1),
overwrite = TRUE,
unit = units(mod11_1),
split = FALSE)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'writeCDF': object 'mod11_1' not found
modtest_split <- rast("~/writetest_nosplit.nc")
# does the exported file detect subdataset?
modtest_split2 <- rast("~/writetest_nosplit.nc", subds = 1)
modtest_split2
#> class : SpatRaster
#> dimensions : 2400, 2400, 12 (nrow, ncol, nlyr)
#> resolution : 463.3127, 463.3127 (x, y)
#> extent : -8895604, -7783654, 3335852, 4447802 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
#> source : writetest_nosplit.nc
#> varname : num_observations_500m
#> names : num_o~00m_1, num_o~00m_2, num_o~00m_3, num_o~00m_4, num_o~00m_5, num_o~00m_6, ...
modtest_split2 <- rast("~/writetest_nosplit.nc", subds = 13)
modtest_split2
#> class : SpatRaster
#> dimensions : 2400, 2400, 12 (nrow, ncol, nlyr)
#> resolution : 463.3127, 463.3127 (x, y)
#> extent : -8895604, -7783654, 3335852, 4447802 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
#> source : writetest_nosplit.nc
#> varname : num_observations_500m
#> names : num_o~00m_1, num_o~00m_2, num_o~00m_3, num_o~00m_4, num_o~00m_5, num_o~00m_6, ...
# print description by subdatasets
describe("/ddn/gs1/home/songi2/writetest_nosplit.nc", sds = TRUE)
#> Error: [gdal (sds)] no subdatasets
## GEOS example
geosdir <-
file.path("/ddn/gs1/group/set",
"Projects",
"NRT-AP-Model",
"input",
"geos")
geosfiles <-
list.files(geosdir,
"*.nc4",
full.names = TRUE,
recursive = TRUE)
# 2018001
geosf <- geosfiles[seq(1, 24)]
# list of SpatRasters
geos2018001 <- lapply(geosf, rast)
# SpatRasterDataset
geos_c <- do.call(sds, geos2018001)
# Aggregate by day
geos_capp <- app(geos_c, index = "days", fun = "mean")
writeCDF(
geos_capp,
"~/writegeostest.nc",
varname = names(geos_capp),
overwrite = TRUE,
unit = units(geos2018001[[1]]),
split = TRUE)
geostest <- rast("~/writegeostest.nc")
geostest
#> class : SpatRaster
#> dimensions : 721, 1440, 5 (nrow, ncol, nlyr)
#> resolution : 0.25, 0.25 (x, y)
#> extent : -180.125, 179.875, -90.125, 90.125 (xmin, xmax, ymin, ymax)
#> coord. ref. :
#> sources : writegeostest.nc:CO_lev=72
#> writegeostest.nc:NO2_lev=72
#> writegeostest.nc:O3_lev=72
#> ... and 2 more source(s)
#> varnames : CO_lev=72
#> NO2_lev=72
#> O3_lev=72
#> ...
#> names : CO_lev=72, NO2_lev=72, O3_lev=72, PM25_R~lev=72, SO2_lev=72
#> unit : mol mol-1, mol mol-1, mol mol-1, ug m-3, mol mol-1
#> time : 2018-01-01 00:30:00 UTC
Created on 2024-01-04 with reprex v2.0.2
While it is hard to say that the exported file is the exactly same as the raw input(s), we might make these files fairly equal by adding detailed arguments in writeCDF
or using ncdf4::ncvar_def
.
@sigmafelix @eva0marques
It is a good point to include all covariates for future testing in the pipeline. The issue with the test data set created in Insang's code is that it will not work with the calc_geos
function. The function was written to import raw GEOS-CF data based on the downloaded file name, and calculates a daily mean within the function. Insang's recommendation of split = TRUE
worked well to retain variable names.
> # working dir = "/ddn/gs1/home/manwareme/NRTAPModel_Covariates/tests/testdata/chm_full/"
> files <- list.files(pattern = ".nc4")
> dw <- terra::vect("../durham_wake.shp")
> dw_p <- terra::project(dw, "EPSG:4326")
> for (f in seq_along(files)) {
+ data <- terra::rast(files[f])
+ terra::crs(data) <- "EPSG:4326"
+ data_p <- terra::project(data, "EPSG:4326")
+ data_crop <- terra::crop(data_p,
+ dw_p,
+ snap = "out")
+ writeCDF(data_crop,
+ # written with full name to different folder
+ paste0("../chm/", files[f]),
+ varname = names(data_crop),
+ split = TRUE,
+ # retain units
+ unit = units(data))
+ }
The written netCDF retains original file name, variable names, and units, and is cropped to the desired extent.
> geos_crop_read <- terra::rast("../chm/GEOS-CF.v01.rpl.chm_tavg_1hr_g1440x721_v1.20180101_0030z.nc4")
> geos_crop_read
class : SpatRaster
dimensions : 4, 4, 52 (nrow, ncol, nlyr)
resolution : 0.25, 0.25 (x, y)
extent : -79.125, -78.125, 35.375, 36.375 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
sources : GEOS-CF.v01.rpl.chm_tavg_1hr_g1440x721_v1.20180101_0030z.nc4:ACET_lev=72
GEOS-CF.v01.rpl.chm_tavg_1hr_g1440x721_v1.20180101_0030z.nc4:ALD2_lev=72
GEOS-CF.v01.rpl.chm_tavg_1hr_g1440x721_v1.20180101_0030z.nc4:ALK4_lev=72
... and 49 more source(s)
varnames : ACET_lev=72
ALD2_lev=72
ALK4_lev=72
...
names : ACET_lev=72, ALD2_lev=72, ALK4_lev=72, BCPI_lev=72, BCPO_lev=72, BENZ_lev=72, ...
unit : mol mol-1, mol mol-1, mol mol-1, mol mol-1, mol mol-1, mol mol-1, ...
>
The only issue with this is the amount of data required as both the GEOS-CF chm and aqc collections will require 24 files (size = 0.058544 mb each). A workaround could be to zip these files and add an unzip()
command in the unit and integration tests, because these are the only tests that would require these raw data.
Minor tweak in calc_geos
function to select variable after import with grep()
function. Allows for subsetting with raw data variable names (ie "ACET") and re-written test variable names (ie "ACET_lev=72"). calc_geos
unit tests pass with this change.
>
>
> testthat::test_that("calc_geos calculates chm covariates.", {
+ # function parameters
+ date <- "2018-01-01"
+ collection <- "chm_tavg_1hr_g1440x721_v1"
+ variable <- "ACET"
+ sites <- readRDS("../testdata/sites_test.rds")
+ identifier <- "site_id"
+ fun <- "mean"
+ directory_with_data <- "../testdata/chm/"
+ # run covariate calculation
+ covariate_geos <- calc_geos(date_start = date,
+ date_end = date,
+ collection = collection,
+ variable = variable,
+ sites = sites,
+ identifier = identifier,
+ fun = fun,
+ directory_with_data = directory_with_data)
+ # test that function returns data frame
+ expect_equal(class(covariate_geos), "data.frame")
+ # test that data frame has three columns
+ expect_equal(ncol(covariate_geos), 3)
+ # test that no missing values
+ expect_false(any(sapply(covariate_geos, is.na)))
+ # test that all days included
+ expect_equal(length(unique(covariate_geos$date)), 1)
+ })
Calculated ACET for date 20180101
Test passed 😀
>
@mitchellmanware Thank you for the changes in calc_geos
.
@Spatiotemporal-Exposures-and-Toxicology @eva0marques I am suddenly confused with the temporal range of subsetting: do we subset all data for 2018-2022 or a limited period? Since the subsets will be used in the entire pipeline for tests, I suppose the former would be the right period.
@sigmafelix pipeline is currently for 2018-2022 inclusive; tests are for a very limited (1 day) period. Does that answer the question?
@Spatiotemporal-Exposures-and-Toxicology Thank you for the clarification. I was briefly confused with "testing pipeline" and "testing functions." Now I realized that this issue is more about the latter.
@sigmafelix @mitchellmanware No problem - it's always good to test more. For instance, we may consider doing a so-called "edge test" with time. This may be checking that 12-31-2022 is valid and 1-1-2023 throws an error. i.e. at the very edge of some domain - this case it is time.
Thank you all for your help. I am currently writing all the code to extract Durham - Wake - Orange counties on 2 days of august 2022 for every downloaded files in input/ folder. The objective is to provide real (but very light) data for the unit tests all along the pipeline. @mitchellmanware I will extract all variables of GEOS CF with the help of your code. @all I know it needs some patience but maybe you could give me a little bit more time before testing your calc_ functions. Otherwise, we are putting the cart before the horse (is that the right expression ? ☺️ in French it is the plow before the cow!). And we run the risk of doing the same work twice.
@eva0marques I am using the code I've put above to make a short vignette to introduce generating small subsets from the raw data. If you are working on the documentation, I will pause it.
@eva0marques
calc_narr
and calc_geos
can be found on branch mm-calc-functions in R/
folder.
@sigmafelix Yes this is my task, it's been several days I am working on it...
@eva0marques Got it, I look forward to seeing a documentation soon!
Work in progress:
Testdata extraction function done
Testdata generated in my local repo /tests/testdata/:
@Spatiotemporal-Exposures-and-Toxicology @mitchellmanware
I would like to create and save the testdata for GEOS-CF CHM directly on the local repo in the /ddn. But I have a permission denied error. Can you change my writing rights to this folder (/Volumes/set/Projects/NRT-AP-Model/tests/testdata/)?
@eva0marques try now - if not we can resolve during the project meeting
It works now, thanks!!! 😃
Test data for the NARR weasd
and omega
data sets have been created on branch mm-copy-functions. I used the ncdf4
package to copy the raw metadata onto a subset of the data and rewrite with all metadata and layer names. A messy version of the code is here (https://github.com/mitchellmanware/amadeus_workbench/blob/main/narr/create_NARR_test_data.R) but I will clean it up to be a template or vignette.
@mitchellmanware can you remind me what you told me at last meeting about the problem with my testdata function for NARR?
@eva0marques
The problem is not with your function but with the way terra
writes netCDF files. Your functions create perfect subsets within the R environment, but terra::writeCDF()
does not write all of the necessary metadata to then re-read the subsetted data with terra::rast()
. The difference is visible when inspecting the contents of the .nc
files with ncdf4::nc_open()
Raw weasd
data:
> weasd_raw <- nc_open("nrtapmodel/covariates/data/narr/weasd/weasd.2018.nc")
> weasd_raw
File nrtapmodel/covariates/data/narr/weasd/weasd.2018.nc (NC_FORMAT_NETCDF4_CLASSIC):
5 variables (excluding dimension variables):
float lat[x,y] (Contiguous storage)
axis: Y
coordinate_defines: point
long_name: Latitude
standard_name: latitude
units: degrees_north
float lon[x,y] (Contiguous storage)
axis: X
coordinate_defines: point
long_name: Longitude
standard_name: longitude
units: degrees_east
int Lambert_Conformal[] (Contiguous storage)
false_easting: 5632642.22547
false_northing: 4612545.65137
grid_mapping_name: lambert_conformal_conic
latitude_of_projection_origin: 50
longitude_of_central_meridian: -107
standard_parallel: 50
standard_parallel: 50
float weasd[x,y,time] (Chunking: [349,277,1]) (Compression: shuffle,level 1)
GRIB_id: 65
GRIB_name: WEASD
_FillValue: 9.96920996838687e+36
coordinates: lat lon
grid_mapping: Lambert_Conformal
level_desc: Surface
standard_name: surface_snow_amount
units: kg/m^2
var_desc: accumulated snow
missing_value: -9.96920996838687e+36
dataset: NARR Daily Averages
long_name: Daily Accumulated Snow at Surface
parent_stat: Individual Obs
statistic: Mean
actual_range: 0
actual_range: 746.8125
valid_range: 0
valid_range: 3000
double time_bnds[nbnds,time] (Chunking: [2,1])
4 dimensions:
time Size:365 *** is unlimited ***
axis: T
long_name: Time
standard_name: time
units: hours since 1800-1-1 00:00:0.0
delta_t: 0000-00-01 00:00:00
avg_period: 0000-00-01 00:00:00
actual_range: 1910952
actual_range: 1919688
coordinate_defines: start
y Size:277
long_name: northward distance from southwest corner of domain in projection coordinates
standard_name: projection_y_coordinate
units: m
x Size:349
long_name: eastward distance from southwest corner of domain in projection coordinates
standard_name: projection_x_coordinate
units: m
nbnds Size:2 (no dimvar)
16 global attributes:
Conventions: CF-1.2
centerlat: 50
centerlon: -107
comments:
institution: National Centers for Environmental Prediction
latcorners: 1.00000095367432
latcorners: 0.897944986820221
latcorners: 46.3544006347656
latcorners: 46.6343307495117
loncorners: -145.5
loncorners: -68.3200531005859
loncorners: -2.56989097595215
loncorners: 148.641799926758
platform: Model
standardpar1: 50
standardpar2: 50.000001
title: Daily NARR
history: created Sat Mar 26 05:32:28 MDT 2016 by NOAA/ESRL/PSD
dataset_title: NCEP North American Regional Reanalysis (NARR)
references: https://www.esrl.noaa.gov/psd/data/gridded/data.narr.html
source: http://www.emc.ncep.noaa.gov/mmb/rreanl/index.html
References:
Cropped with terra
and rewritten
> weasd_terra <- terra::rast("nrtapmodel/covariates/data/narr/weasd/weasd.2018.nc")
> nc <- terra::vect(system.file("shape/nc.shp", package = "sf"))
> co <- terra::project(nc[nc$NAME %in% c("Durham", "Wake", "Orange")],
+ terra::crs(weasd_terra))
> weasd_terra_crop <- terra::crop(weasd_terra, co, snap = "out")
> terra::writeCDF(weasd_terra_crop, "~/Desktop/weasd.2018.nc")
> weasd_terra_reread <- nc_open("~/Desktop/weasd.2018.nc")
> weasd_terra_reread
File ~/Desktop/weasd.2018.nc (NC_FORMAT_NETCDF4):
2 variables (excluding dimension variables):
float weasd.2018[easting,northing,time] (Contiguous storage)
_FillValue: -1.17549402418441e+38
grid_mapping: crs
int crs[] (Contiguous storage)
crs_wkt: PROJCRS["unnamed", BASEGEOGCRS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4326]], CONVERSION["unnamed", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG",9802]], PARAMETER["Latitude of false origin",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8821]], PARAMETER["Longitude of false origin",-107, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8822]], PARAMETER["Latitude of 1st standard parallel",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8823]], PARAMETER["Latitude of 2nd standard parallel",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8824]], PARAMETER["Easting at false origin",5632642.22547, LENGTHUNIT["metre",1], ID["EPSG",8826]], PARAMETER["Northing at false origin",4612545.65137, LENGTHUNIT["metre",1], ID["EPSG",8827]]], CS[Cartesian,2], AXIS["easting",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["northing",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]]
spatial_ref: PROJCRS["unnamed", BASEGEOGCRS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4326]], CONVERSION["unnamed", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG",9802]], PARAMETER["Latitude of false origin",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8821]], PARAMETER["Longitude of false origin",-107, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8822]], PARAMETER["Latitude of 1st standard parallel",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8823]], PARAMETER["Latitude of 2nd standard parallel",50, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8824]], PARAMETER["Easting at false origin",5632642.22547, LENGTHUNIT["metre",1], ID["EPSG",8826]], PARAMETER["Northing at false origin",4612545.65137, LENGTHUNIT["metre",1], ID["EPSG",8827]]], CS[Cartesian,2], AXIS["easting",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["northing",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]]
proj4: +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs
geotransform: 8131978.62068965472281 32462.98850574716925621 0 3554698.5 0 -32463
3 dimensions:
easting Size:4
units: meter
long_name: easting
northing Size:3
units: meter
long_name: northing
time Size:365
units: seconds since 1970-1-1 00:00:00
long_name: time
calendar: standard
3 global attributes:
Conventions: CF-1.4
created_by: R packages ncdf4 and terra (version 1.7-65)
date: 2024-02-16 14:42:09
Using ncdf4
package to recreate all metadata
> weasd_ncdf4 <- nc_open("amadeus/tests/testdata/narr/weasd/weasd.2018.nc")
> weasd_ncdf4
File amadeus/tests/testdata/narr/weasd/weasd.2018.nc (NC_FORMAT_CLASSIC):
4 variables (excluding dimension variables):
float weasd[x,y,time]
units: kg/m^2
_FillValue: 1.00000003318135e+32
long_name: Daily Accumulated Snow at Surface
grid_mapping: Lambert_Conformal
coordinates: lat lon
double lon[x,y]
units: degrees_east
long_name: Longitude of cell center
double lat[x,y]
units: degrees_north
long_name: Latitude of cell center
char Lambert_Conformal[]
units: 1
name: lambert_conformal_conic
long_name: lambert_conformal_conic
grid_mapping_name: lambert_conformal_conic
longitude_of_central_meridian: -107
latitude_of_projection_origin: 50
standard_parallel: 50
standard_parallel: 50
false_easting: 5632642.22547
false_northing: 4612545.65137
_CoordinateTransformType: Projection
_CoordinateAxisTypes: GeoX GeoY
3 dimensions:
x Size:4
units: m
long_name: eastward distance from southwest corner of domain in projection coordinates
axis: X
standard_name: projection_x_coordinate
_CoordinateAxisType: GeoX
y Size:3
units: m
long_name: northward distance from southwest corner of domain in projection coordinates
axis: Y
standard_name: projection_y_coordinate
_CoordinateAxisType: GeoY
time Size:365
units: hours since 1800-1-1 00:00:0.0
long_name: time
5 global attributes:
title: test output of projected data
institution: NOAA ESRL PSD
source: weasd
history: P.J. Bartlein, Fri Feb 16 11:43:37 2024
Conventions: CF=1.6
You will notice in this last chunk the metadata is retained by the dimensions match our desired subset.
I see 🤔 I'm sorry about that...
No need to be sorry! Maybe we can pursue writing a more general function to write metadata to one netCDF from another and then apply that to the create test data functions
We can utilize the same area for test sets to be carried throughout the pipeline