earthlab / cft

Climate futures toolbox: easy MACA (MACAv2) climate data access 📦
https://www.earthdatascience.org/cft/index.html
24 stars 7 forks source link

Pulled_data error #152

Closed acrunyon closed 2 years ago

acrunyon commented 2 years ago

I've adjusted the README.Rmd you provided to pull all years, all GCMs for 3 variables for a single grid cell (.Rmd attached). It looked to me like the input_times and input_variables were correct but I got the following error (screenshot attached). Any ideas what is going on?

Rmd I've been using

Untitled picture

CURL Error: Transferred a partial file Error in Rsx_nc4_get_vara_int: NetCDF: DAP failure Var: pr_BNU-ESM_r1i1p1_rcp85 Ndims: 3 Start: 0,444,511 Count: 34333,2,3 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function Rsx_nc4_get_var_int returned error

gjknowlton commented 2 years ago

I had the same issue when running the example script that amber had above with the same error message:

CURL Error: Transferred a partial file Error in Rsx_nc4_get_vara_int: NetCDF: DAP failure Var: pr_CCSM4_r6i1p1_rcp85 Ndims: 3 Start: 0,444,511 Count: 34333,2,3 Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, : C function Rsx_nc4_get_var_int returned error

ttuff commented 2 years ago

This is a runtime error. Data requests are limited to 500mb per call by the data provider. You are getting this error because you are requesting more than the maximum data allowed from one call. You will need to reduce the spatial extent or the temporal extent to get this below the maximum. When I reduce the spatial extent to a much smaller national park, the run time error goes away and I can get all the data. When I bump back up to the entire Yellowstone region, I need to stitch it together.

I will work on a few solutions to this problem, but for now, just request smaller data chunks each time.

First subset your time into a few groups. `time_min <- 38716 time_max <- 73048

input_times <- inputs$available_times %>% add_column(index = 0) %>% add_column(first_half = 0) %>% add_column(second_half = 0) input_times[which(inputs$available_times[,1] >= time_min & inputs$available_times[,1] <= time_max ),3] <- 1

med <- median(row_number(input_times[,3])) input_times[which(as.numeric(row.names(input_times)) <= med),4] <- 1 input_times[which(as.numeric(row.names(input_times)) > med),5] <- 1

head(input_times) tail(input_times)`

And then request those two subsets and stitch them back together.

`Pulled_data_sub1 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")

head(Pulled_data_sub1)

Pulled_data_sub2 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")

head(Pulled_data_sub2)

Pulled_data_stitch <- rbind(Pulled_data_sub1, Pulled_data_sub2) head(Pulled_data_stitch) tail(Pulled_data_stitch)`

I've added this to the new readme file for your reference and I will work on incorporating a parallel version to make this easier and faster.

Cheers, Ty

acrunyon commented 2 years ago

Hi Ty, Can we meet again sometime this week? I am only pulling a single grid cell, which is 98% of what we do. Like I said, for this to work for us, we have to be able to pull all years (1950-2100), GCMs, RCPs for tmax, tmin, precip, and preferably rhmin, rhmax. Before the rgdal updates broke the package, we were able to pull all of this data for a single grid cell in <2 hrs.

We have spent years building an infrastructure of analysis around these data, and have just built an R package based on it as well. I think it may help if I show you how we work with this data, so you have a better idea of our needs from this package.


From: Ty Tuff @.> Sent: Monday, November 1, 2021 11:45 To: earthlab/cft @.> Cc: Runyon, Amber N @.>; Author @.> Subject: [EXTERNAL] Re: [earthlab/cft] Pulled_data error (Issue #152)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

This is a runtime error. Data requests are limited to 500mb per call by the data provider. You are getting this error because you are requesting more than the maximum data allowed from one call. You will need to reduce the spatial extent or the temporal extent to get this below the maximum. When I reduce the spatial extent to a much smaller national park, the run time error goes away and I can get all the data. When I bump back up to the entire Yellowstone region, I need to stitch it together.

I will work on a few solutions to this problem, but for now, just request smaller data chunks each time.

First subset your time into a few groups. `time_min <- 38716 time_max <- 73048

input_times <- inputs$available_times %>% add_column(index = 0) %>% add_column(first_half = 0) %>% add_column(second_half = 0) input_times[which(inputs$available_times[,1] >= time_min & inputs$available_times[,1] <= time_max ),3] <- 1

med <- median(row_number(input_times[,3])) input_times[which(as.numeric(row.names(input_times)) <= med),4] <- 1 input_times[which(as.numeric(row.names(input_times)) > med),5] <- 1

head(input_times) tail(input_times)`

And then request those two subsets and stitch them back together.

`Pulled_data_sub1 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,4] == 1) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")

should time be in here?

head(Pulled_data_sub1)

Pulled_data_sub2 <- inputs$src %>% hyper_filter(lat = lat <= c(pulled_bb[4]+0.05) & lat >= c(pulled_bb[2]-0.05)) %>% hyper_filter(lon = lon <= c(pulled_bb[3]+0.05) & lon >= c(pulled_bb[1]-0.05)) %>% hyper_filter(time = input_times[,5] == 1) %>% hyper_tibble(select_var = input_variables ) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, agr = "constant")

should time be in here?

head(Pulled_data_sub2)

Pulled_data_stitch <- rbind(Pulled_data_sub1, Pulled_data_sub2) head(Pulled_data_stitch) tail(Pulled_data_stitch)`

I've added this to the new readme file for your reference and I will work on incorporating a parallel version to make this easier and faster.

Cheers, Ty

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fearthlab%2Fcft%2Fissues%2F152%23issuecomment-956444817&data=04%7C01%7Camber_runyon%40nps.gov%7Cb914525e579a4724b1b908d99d5f7396%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637713855559885749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rEMG5AQ5TN0Ebszs7H8IQV6Hn5qEy680A%2Fw%2FdZG5%2F7s%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABIODWU3CYYR4VH7DSTWPZLUJ3G3PANCNFSM5GHVMYUQ&data=04%7C01%7Camber_runyon%40nps.gov%7Cb914525e579a4724b1b908d99d5f7396%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637713855559885749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WLVUtMxli24Mvo1%2FIrPFLR%2BIZkzJ%2B0oWDNf9BBIp97s%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Camber_runyon%40nps.gov%7Cb914525e579a4724b1b908d99d5f7396%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637713855559895707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W5D%2BQNMc5OyFQEfUuJ7WNZT5vfMRDKAPxAieFDigSMI%3D&reserved=0 or Androidhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Camber_runyon%40nps.gov%7Cb914525e579a4724b1b908d99d5f7396%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637713855559895707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LS7LeubaFVA%2BrUYw7G6uZBTkAoVIrXMQlTrcD6O91cQ%3D&reserved=0.

ttuff commented 2 years ago

I finally finished a new parallel function to fix this error. See the new Firehose function: https://github.com/earthlab/cft/blob/main/vignettes/firehose.md