UCANR-IGIS / caladaptr

R package to import climate data from Cal-Adapt.org into R via the API.
https://ucanr-igis.github.io/caladaptr/
GNU General Public License v3.0
6 stars 0 forks source link

Error: Duplicate values of `hash_int` found #1

Open lucyrandrews opened 2 years ago

lucyrandrews commented 2 years ago

Hi Andy!

I'm getting an error message in using the package that says I should contact you. It's a bit cumbersome to make a reprex with the data I'm using, so here's my best effort.

# load packages
library(tidyverse)
library(sf)
library(caladaptr)

# download restoration projects data
data_url <- "https://drive.google.com/uc?export=download&id=1-pba7MR93PcYwxqBdWOKcn_eVPi9ki9H"
restoration_projects <- st_read(dsn = data_url)

# grab LOCA grid
loca <- ca_locagrid_geom() %>%
  st_transform(crs = st_crs(restoration_projects))

# trim restoration projects dataset to LOCA grid and
# specify coordinates of restoration projects for API call
restoration_coords <- st_join(restoration_projects, loca, left = FALSE) %>%
  st_drop_geometry() %>%
  dplyr::select(center_longitude, center_latitude) %>%
  rename(x = center_longitude,
         y = center_latitude)

# generate a precip API call
precip_api_call <- ca_loc_pt(x = ca_apireq(),
                             restoration_coords) %>%
  ca_gcm("ens32avg") %>%
  ca_scenario("historical") %>%
  ca_cvar("pr") %>%
  ca_period("year") %>%
  ca_years(start = 1980, end = 2005)

# make API call for precip data
precip <- precip_api_call %>%
  ca_getvals_tbl(quiet = TRUE)

This runs all the way through until the API call on the last two lines, which throws an error:

Error in stopwarn(pf, check_for, stop_msg = "Duplicate values of `hash_int` found. This should not happen - please contact the package author.") : Duplicate values of `hash_int` found. This should not happen - please contact the package author.

Session info (lmk if you all want a list of all loaded packages):

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 12.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
ajlyons commented 2 years ago

Hi Lucy,

Thanks for this (and good repex). The underlying issue is you've got a good number of duplicate coordinates in your coordinates matrix. You can see this by scrolling through it in RStudio.

restoration_coords %>% arrange(x,y) %>% View()

Next time I update the package I'll add a check for duplicate features, and improve the verbiage of the message. Surprisingly I've never encountered this before (thank you for sharing what is probably a very common case of gotcha!). In the meantime, you can proceed by making a call with no duplicate features. I see that this is a little tricky in your case since your original dataset (restoration_projects) contains some duplicate locations, as well as locations that fall outside the loca coverage area. It looks like it doesn't have a primary key either. Perhaps that's what you were trying to deal with using st_join?

The following works by filtering out points that lie outside the loca grid. It gives a warning about duplicate features but doesn't stop you. Note however since your original dataset doesn't have a primary key, it uses the row number as the feature id. You'd be better off adding a primary key first which will make joining the results of the API call a lot easier (using dplyr::left_join).

precip <- precip_api_call %>%
  ca_loc_sf(loc = restoration_projects[loca, ]) %>% 
  ca_getvals_tbl()

To be purist, you could use the approach described in the vignette on Large Queries whereby you clump features into grid cells, generate a separate feature layer of those grid cells, fetch data, and then join the results back to the features using an attribute join. This would require you first to add a primary key to restoration_projects. I wouldn't recommend that much work such a modest dataset, but that's the idea. Feel free to get in touch if you want to go down that route.

ajlyons commented 2 years ago

Duplicate coordinates are now trapped and reported as of version 0.6.4. This should prevent this error message from occurring. LMK if you have any additional troubles.