dklinges9 / mcera5

mcera5
12 stars 9 forks source link

error thrown in .nc_to_df if required coords not in netCDF #1

Closed dklinges9 closed 1 year ago

dklinges9 commented 4 years ago

This problem may have been native to my machine, or perhaps is due to some misunderstanding on my part of how to interact with era5 netCDFs, but probably worth mentioning as someone may encounter again:

I found that if the (x,y) handed to point_nc_to_df() were not explicitly found in the netCDF, filtering of the hyper_tibble– conducted in .nc_to_df()– will end up with an empty slice:

> Error in hyper_filter.tidync(., longitude = longitude == x, latitude = latitude ==  : 
> subexpression for [longitude] results in empty slice

Corresponding code that throws the error:

# process relevant hourly climate data from an ERA5 nc to data frame
.nc_to_df <- function(nc, x, y, start_time, end_time) {

  dat <- tidync::tidync(nc) %>%
    tidync::hyper_filter(longitude = longitude == x,
                         latitude = latitude == y) %>%
...
...
...
  return(dat)
}

What I modified was to search the netCDF for the coords that were closest to the desired coordinates. I used a method of searching for the nearest neighbor coords using the data.table package because this appears to be a fast method, but a base R method would work fine too– see here for benchmarking of 3 methods.

I inserted the following code chunk at the top of .nc_to_df, which is line 91 in functions.R, and this chunk replaces lines 92-95:

.nc_to_df <- function(nc, x, y, start_time, end_time) {

  library(data.table)

  ## Find coordinates in the netCDF that are closest to required coords

  # Extract coords from netCDF
  dat <- tidync::tidync(nc) %>% 
    tidync::hyper_tibble()
  lons <- unique(dat$longitude)
  lats <- unique(dat$latitude)

  dt <- data.table(lons, val = lons) # you'll see why val is needed in a sec
  setattr(dt, "sorted", "lons")  # let data.table know that longitude is sorted
  setkey(dt, lons) # sorts the data according to longitude
  # binary search and "roll" to the nearest neighbour
  # In the final expression the val column will have the you're looking for.
  # Save our old x
  x_old <- x
  # Find new coordinate
  x <- dt[J(x), roll = "nearest"]$val

  dt <- data.table(lats, val = lats) # you'll see why val is needed in a sec
  setattr(dt, "sorted", "lats")  # let data.table know that latitude is sorted
  setkey(dt, lats) # sorts the data accoding to latitude
  # binary search and "roll" to the nearest neighbour
  # In the final expression the val column will have the you're looking for.
  # Save our old x
  y_old <- y
  # Find new coordinate
  y <- dt[J(x), roll = "nearest"]$val

  # Find euclidean distance between old and new coordinates
  euc_dist <- dist(matrix(c(x_old, x, y_old, y), nrow = 2))

  # Inform the user
  print(paste0("The desired coordinates are not represented in the netCDF file, choosing instead 
  closest coordinates of (", x, ", ", y, "), which are euclidean distance of ", euc_dist, " km away from 
  your desired coordinates."))

  # Our dat is already a hyper_tibble, so performing regular dplyr::filter rather
  # than hyper_filter
  dat <- dat %>%
    dplyr::filter(longitude == x & latitude == y) %>% 

## Everything after this point is unmodified

At the moment my modification doesn't perform any corrections due to this change in location, although I'm not sure what corrections would be conducted or if any are necessary. At the least this would notify the user of the change.

If you'd like reproducible code just let me know!

everydayduffy commented 4 years ago

Hey @dklinges9 , thanks again for the report. I'm pretty sure this was caused by my assumption that ERA5 data always came on a 0.25 grid with origin at -180 and -90. A few tests show that if your minimum lat and long are not divisible by 0.25, it will go ahead and set the origins to those coords anyway.

For example if my xmin was 1.23 and xmax 2.12, then the data would be provided for 1.23 / 1.48 / 1.73 / 1.98. Note also that no data beyond 1.98 is provided...this is now dealt with (see below).

To keep things consistent and to make sure the inverse distance weighting works as desired, I have added a few lines to round the users input coords to 0.25 within build_era5_request. Floor and ceiling are used to ensure no-one gets short changed (in space). See latest commit.

If you could test and let me know if that solves the problem, that would be great.

dklinges9 commented 1 year ago

Doing some housekeeping, and this issue has been addressed.