Closed serbinsh closed 5 years ago
With verbose on
Loading required package: PEcAn.benchmark
> download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:16:10 INFO [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
Downloading
/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2001.nc
2017-09-13 09:16:10 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:16:17 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var surface_downwelling_longwave_flux_in_air"
OK, so turing on verbose helped it through one year but then failed on the next?
2017-09-13 09:16:55 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_qair_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var specific_humidity"
2017-09-13 09:17:05 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_rain_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var precipitation_flux"
2017-09-13 09:17:14 INFO [PEcAn.logger::logger.info] :
Downloading
/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2002.nc
2017-09-13 09:17:14 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2002_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
And if I go to the output I can see the files:
-rw-rw-r--. 1 apache test 59760 Sep 13 09:17 CRUNCEP.2001.nc
-rw-rw-r--. 1 apache test 59732 Dec 5 2016 CRUNCEP.2002.nc
-rw-rw-r--. 1 apache test 59732 Dec 5 2016 CRUNCEP.2003.nc
-rw-rw-r--. 1 apache test 59892 Dec 5 2016 CRUNCEP.2004.nc
-rw-rw-r--. 1 apache test 59732 Dec 5 2016 CRUNCEP.2005.nc
-rw-rw-r--. 1 apache test 59732 Dec 5 2016 CRUNCEP.2006.nc
Strange, if I try to check the next file, 2002 lwdown it does suggest it isn;t there?
> ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2002_v1.nc4')
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
But on the web I can see the file?
Ugh! Wait now files that did work aren't workign
> ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2001_v1.nc4')
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
OK, not really sure what the deal is but it seems fairly spotty
2017-09-13 09:26:18 INFO [PEcAn.logger::logger.info] :
Downloading
/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2002.nc
2017-09-13 09:26:18 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2002_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:26:24 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2002_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var surface_downwelling_longwave_flux_in_air"
2017-09-13 09:26:30 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_press_2002_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: press Ndims: 3 Start: 0,161,200 Count: 1460,1,1
Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, :
C function R_nc4_get_vara_double returned error
Do we need to add some sort of try/catch or delay? It seems like if there is a slight delay from THREDDS is crashes, perhaps?
Added a 3 second delay to the function:
# This throws an error if file not found
dap <- ncdf4::nc_open(dap_file, verbose=FALSE)
Sys.sleep(3)
at line 101
That made it work for 2 years
-rw-rw-r--. 1 apache test 59760 Sep 13 09:36 CRUNCEP.2001.nc
-rw-rw-r--. 1 apache test 59760 Sep 13 09:37 CRUNCEP.2002.nc
OK, so there definitely is an issue, at least on modex, where a single year crashes but if I keep trying it will then work
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:38:37 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
> download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:38:46 INFO [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
Downloading
/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.1999.nc
2017-09-13 09:38:46 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_1999_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
> download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:38:49 INFO [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
Downloading
/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.1999.nc
2017-09-13 09:38:49 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_1999_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:38:59 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4
@robkooper @mdietze @ashiklom @jsimkins2 Any thoughts on this? Run-down, if I re-run it will eventually pull the data....but in some cases it might take a few attempts....there seems to be some issue with nc_open getting a delay via thredds and then it fails saying the file isn't there, when it actually is.....perhaps that is just a modex issue due to the firewall?
But if I keep re-running it seems it will eventually get the data that it was failing on
Perhaps we had some robust error handling to try and restart a few times before ultimately failing if indeed THREDDS is down?
I was also getting this error on the VM. And when I reran from rstudio I still got the same error.
Just saw this. We've run into this before here: #1546 and we would just rerun it and it would download without an error, making it hard to replicate and figure out what's wrong. We thought it could be an issue specific to our machine. Good to know it's not.
I agree with creating some robust error handling.
I think it has as much to do with the DOE THREDDS server either doing something like limiting max number of connections or preventing rapid number of calls to the server to prevent denial of service attacks or something. I know @jsimkins2 has had communication in the past with them about non reliable calls. I think the function isn't doing anything wrong.
One idea would be to add the sleep, detect error, and try three times before giving up?
@ankurdesai I agree with this assessment. I already add a Sys.sleep, which did help but I want to add a 10 retry to the function to give it enough chances to get a successful download.
@ankurdesai @serbinsh Does that mean there's a better time of day to try doing it? Like should I be downloading at night instead of in the middle of the day?
@araiho Not sure...I think it is an IT issue really. As Ankur mentioned, servers don't like rapid multi file requests as it can look like a DoS attack
What about something like:
retry <- function(expr, isError=function(x) "try-error" %in% class(x), maxErrors=5, sleep=0) {
attempts = 0
retval = try(eval(expr))
while (isError(retval)) {
attempts = attempts + 1
if (attempts >= maxErrors) {
msg = sprintf("retry: too many retries [[%s]]", capture.output(str(retval)))
PEcAn.logger::logger.warn(msg)
stop(msg)
} else {
msg = sprintf("retry: error in attempt %i/%i [[%s]]", attempts, maxErrors,
capture.output(str(retval)))
PEcAn.logger::logger.warn(msg)
#warning(msg)
}
if (sleep > 0) Sys.sleep(sleep)
retval = try(eval(expr))
}
return(retval)
}
where you call it as
dap <- retry(ncdf4::nc_open(dap_file, verbose=FALSE),maxErrors=10, sleep=2)
Produces something like
> retry(ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
2017-09-13 10:50:17 WARN [#13: PEcAn.logger::logger.warn] :
retry: error in attempt 1/10 [[Class 'try-error' atomic [1:1] Error in
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), : ]] retry: error in attempt 1/10 [[ C function
R_nc4_get_| __truncated__]] retry: error in attempt 1/10 [[ ..- attr(*,
"condition")=List of 2]] retry: error in attempt 1/10 [[ .. ..$ message:
chr "C function R_nc4_get_vara_double returned error"]] retry: error in
attempt 1/10 [[ .. ..$ call : language
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
1/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
"condition"]]
File https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4 (NC_FORMAT_CLASSIC):
6 variables (excluding dimension variables):
int crs[]
semi_major_axis: 6370997
grid_mapping_name: latitude_longitude
inverse_flattening: 0
double lat_bnds[nv,lat]
_ChunkSizes: 360
_ChunkSizes: 2
double lon_bnds[nv,lon]
_ChunkSizes: 720
_ChunkSizes: 2
double time_bnds[nv,time]
_ChunkSizes: 1
_ChunkSizes: 2
float lwdown[lon,lat,time]
cell_methods: area:mean time:sum
grid_mapping: crs
long_name: Incoming_Long_Wave_Radiation
units: W/m2
missing_value: -999
_FillValue: -999
_ChunkSizes: 1
_ChunkSizes: 360
_ChunkSizes: 720
int mask[lon,lat]
_ChunkSizes: 360
_ChunkSizes: 720
4 dimensions:
time Size:1460 *** is unlimited ***
bounds: time_bnds
long_name: time
units: days since 1700-01-01T00:00:00Z
standard_name: time
calendar: gregorian
_ChunkSizes: 1
lat Size:360
units: degrees_north
standard_name: latitude
bounds: lat_bnds
_ChunkSizes: 360
lon Size:720
units: degrees_east
standard_name: longitude
bounds: lon_bnds
_ChunkSizes: 720
nv Size:2
4 global attributes:
model_name: STOMATE-SECHIBA-LPJ
history: Tue Mar 1 10:57:59 2011: ncks -d time_counter,143176,144635 ncep9_lwdown_1901_2010_2D.nc cruncepV4_leap_lwdown_1999.nc
Conventions: CF-1.4
DODS_EXTRA.Unlimited_Dimension: time
Warning message:
In eval(expr) : restarting interrupted promise evaluation
>
Seems to be working
2017-09-13 11:12:28 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2000_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time Ndims: 1 Start: 0 Count: 1464
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(), :
C function R_nc4_get_vara_double returned error
In addition: Warning message:
In eval(expr) : restarting interrupted promise evaluation
2017-09-13 11:12:29 WARN [PEcAn.logger::logger.warn] :
retry: error in attempt 1/10 [[Class 'try-error' atomic [1:1] Error in
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), : ]] retry: error in attempt 1/10 [[ C function
R_nc4_get_| __truncated__]] retry: error in attempt 1/10 [[ ..- attr(*,
"condition")=List of 2]] retry: error in attempt 1/10 [[ .. ..$ message:
chr "C function R_nc4_get_vara_double returned error"]] retry: error in
attempt 1/10 [[ .. ..$ call : language
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
1/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
"condition"]]
Its handling hiccups in the function and continuing to run, so far
@serbinsh where did you put that retry function?
Here is an example where it retry several times
In eval(expr) : restarting interrupted promise evaluation
2017-09-13 11:12:55 WARN [PEcAn.logger::logger.warn] :
retry: error in attempt 5/10 [[Class 'try-error' atomic [1:1] Error in
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), : ]] retry: error in attempt 5/10 [[ C function
R_nc4_get_| __truncated__]] retry: error in attempt 5/10 [[ ..- attr(*,
"condition")=List of 2]] retry: error in attempt 5/10 [[ .. ..$ message:
chr "C function R_nc4_get_vara_double returned error"]] retry: error in
attempt 5/10 [[ .. ..$ call : language
ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
5/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
"condition"]]
2017-09-13 11:13:10 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_swdown_2000_v1.nc4
2017-09-13 11:13:19 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_uwind_2000_v1.nc4
2017-09-13 11:13:32 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_vwind_2000_v1.nc4
2017-09-13 11:13:43 INFO [PEcAn.logger::logger.info] :
https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_qair_2000_v1.nc4
@araiho I put it in base/utils.R on my develop branch. I will send a PR soon for comments
Closing this, because I think it has been resolved by a combination @serbinsh 's retry code #1651 and by implementing an alternative, more reliable remote data access method (NetCDF subset, or <method>ncss</method>
in the PEcAn XML) by me in #2153.
When using the latest develop branch I am getting this error when trying to use PEcAn.data.atmosphere::download.CRUNCEP()
Description
Bug fix. I think the issue is the use of "verbose" in the call to ncdf4::ncvar_def
It should be verbose= TRUE/FALSE but currently in the function it is:
Which leads to this error:
But if you change to verbose = TRUE
But we probably want verbose = FALSE?
Context
Bug fix
Possible Implementation
NA