dklinges9 / mcera5

mcera5
12 stars 9 forks source link

request_era5() issue when combining .nc files #42

Open kris-wild opened 6 days ago

kris-wild commented 6 days ago

Hi,

My name is Kristoffer Wild. I'm trying to use 'request_era5' function and I keep running into an issue when trying to combine the .nc files using the 'combine_netcdf' argument (see below). I've dug into the 'request_era5' function from here: https://rdrr.io/github/dklinges9/mcera5/src/R/request_era5.R and still can't figure out why I'm unable to combine the .nc files for a given year.

Since the last package update, my .nc files are coming in in groups of 12 (1950_1950_1.nc, 1950_1950_2.nc,...;presumably by month?) when requesting data for a given area. In the past I get only one file per year. Anyways, below is the code I'm working with. All you have to do is adjust your info for: working directory, uid, cds_api_key. I'm sorry if there is a total oversight on my end and thank you again for your help. Below is the code followed by the error:

-----Download climate data from ERA5-----

library(mcera5) library(dplyr) library(ecmwfr) library(lubridate) library(tidync) library(microclima)

build_era5land_request <- function (xmin, xmax, ymin, ymax, start_time, end_time, outfile_name = "era5_out") { if (missing(xmin)) { stop("xmin is missing") } if (missing(xmax)) { stop("xmax is missing") } if (missing(ymin)) { stop("ymin is missing") } if (missing(ymax)) { stop("ymax is missing") } if (missing(start_time)) { stop("start_time is missing") } if (missing(end_time)) { stop("end_time is missing") } xmin_r <- plyr::round_any(xmin, 0.25, f = floor) xmax_r <- plyr::round_any(xmax, 0.25, f = ceiling) ymin_r <- plyr::round_any(ymin, 0.25, f = floor) ymax_r <- plyr::round_any(ymax, 0.25, f = ceiling) ar <- paste0(ymax_r, "/", xmin_r, "/", ymin_r, "/", xmax_r) ut <- uni_dates(start_time, end_time) request <- list() for (i in 1:length(unique(ut$yea))) { yr <- unique(ut$yea)[i] sub_mon <- ut %>% dplyr::filter(., yea == yr) %>% dplyr::select(., mon) sub_request <- list(dataset_short_name = "reanalysis-era5-land", product_type = "reanalysis", variable = c("2m_temperature", "2m_dewpoint_temperature", "surface_pressure", "10m_u_component_of_wind", "10m_v_component_of_wind", "total_precipitation", "total_cloud_cover", "mean_surface_net_long_wave_radiation_flux", "mean_surface_downward_long_wave_radiation_flux", "total_sky_direct_solar_radiation_at_surface", "surface_solar_radiation_downwards", "land_sea_mask"), year = as.character(yr), month = as.character(sub_mon$mon), day = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31"), time = c("00:00", "01:00", "02:00", "03:00", "04:00", "05:00", "06:00", "07:00", "08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"), area = ar, format = "netcdf", target = paste0(outfilename, "", yr, ".nc")) request[[i]] <- sub_request } return(request) }

request_era52 <- function (request, uid, out_path, overwrite = FALSE, combine = TRUE, timeout = 18000) { if (length(request) == 1 & combine) { cat("Your request will all be queried at once and does not need to be combined.\n") } for (req in 1:length(request)) { if (file.exists(paste0(out_path, "/", request[[req]]$target)) & !overwrite) { if (length(request) > 1) { stop("Filename already exists within requested out_path in request ", req, " of request series. Use overwrite = TRUE if you wish to overwrite this file.") } else { stop("Filename already exists within requested out_path. Use overwrite = TRUE if you wish to overwrite this file.") } } ecmwfr::wf_request(user = as.character(uid), request = request[[req]], transfer = TRUE, path = out_path, verbose = TRUE, time_out = timeout) if (file.exists(paste0(out_path, "/", request[[req]]$target))) { if (length(request) > 1) { cat("ERA5 netCDF file", req, "successfully downloaded.\n") } else { cat("ERA5 netCDF file successfully downloaded.\n") } } } if (length(request) > 1 & combine) { cat("Now combining netCDF files...\n") fnames <- lapply(request, function(x) { x$target }) combine_netcdf(filenames = fnames, combined_name = "combined.nc") cat("Finished.\n") } }

Download ERA5 data

setwd and assign your credentials

setwd("/Volumes/The Brain/ERA5_Australia/") uid<-"XXXXXXX" cds_api_key<-"XXXXXXX" ecmwfr::wf_set_key(user = uid, key = cds_api_key)

######## Loop for data by year for(year in 1966:1967){

Building a request

bounding coordinates

xmn<- 112 xmx<- 154 ymn<- -45 ymx<- -10

temporal extent

st_time<-lubridate::ymd(paste0(year, ":01:01")) en_time<-lubridate::ymd(paste0(year, ":12:31"))

filename and location for downloaded .nc files

fileprefix<-paste(i,"",i+5,sep="")

op <-paste0('/Volumes/The Brain/ERA5_Australia/') if(exists(op)!=TRUE){ dir.create(op) }

build a request (covering multiple years)

req<-build_era5_land_request(xmin = xmn, xmax = xmx, ymin = ymn, ymax = ymx, start_time = st_time, end_time = en_time, outfile_name = year)

Obtaining data with a request

request_era52(request = req, uid = uid, out_path = op, timeout = 18000 * 2)

}

}

ERA5 netCDF file 12 successfully downloaded. Now combining netCDF files... Error in combine_netcdf(filenames = fnames, combined_name = "combined.nc") : could not find function "combine_netcdf"

dklinges9 commented 3 days ago

Hi Kris,

(we can continue our email conversation, but replying here so my response can be seen by others publicly)

Thanks for bringing this up. First, the error you're receiving here (could not find function "combine_netcdf") is because comine_netcdf() is an internal function to mcera5, so it's not exported when the package is built and therefore not available when you load the package (e.g. via library(mcera5)). For debugging purposes, you'd need to clone the mcera5 repository and explicitly source() call the script R/internal.R to have the function combine_netcdf() available. That said, such debugging is my job, not yours! And I might as well just make combine_netcdf() available externally.

Beyond this, I did indeed find a bug in combine_netcdf(), which I just fixed with this recent commit. There might be other issues at play here; it's hard for me to test with your files (or with any files) at the moment as presently the CDS server is down. So I'll keep this issue open and continue exploring.

As some heads up: currently, build_era5_request() queries files by month to help keep each query below new CDS limits, which have been lowered. I have functionality ready locally to allow the user to choose between monthly or annual queries (which might have trade-offs on download speed). I won't push this yet, until I've tested it a bit more (once the CDS server is working again)

dklinges9 commented 3 days ago

combine_netcdf() is now also exported with this commt, to serve as a stand-alone function (and this would make such debugging useful in the future)