Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
757 stars 265 forks source link

Runtime Error: NetCDF: File not found #755

Open tnoelcke opened 6 years ago

tnoelcke commented 6 years ago

I'm working on reading some data from a Thredds server using OPeNDAP in a nedcdf4 format. The URL for the server I'm working with is here: https://climate.northwestknowledge.net/RangelandForecast/download.php When accesses some of the lat lon values in this data set I get this from the terminal:

Traceback (Most recent call last):
    File "getData.py", line 91 in <module>
        (latI, lonI) = getIndex(latTarget, lonTarget, lathandle, lonhandle, datahanlde)
    File "getData.py", line 62, in getIndex
        check = datahanlde[lat_index, lon_index, 0]
    File "netCDF4/_netCDF4.pyx" , line 3961, in netCDF4._netCDF4.Variable.__getitem__
    File "netCDF4/_netCDF4.pyx" , line 3961, in netCDF4._netCDF4.Variable.get
    File "netCDF4/_netCDF4.pyx" , line 3961, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: file not found.

I'm running Bash Ubuntu 14.04 as a linux subsystem on a windows machine. I'm using conda v4.4.8 running python 2.7.14 I have hdf5 installed along with netcdf4 version 1.3.1

I can post the code if you feel you need it.

Thanks!

jswhit commented 6 years ago

That error usually means the file was not accessible for some reasons (either the server was down, or you don't have permissions to access it).

What is the actual URL you used? (the one you gave is not a valid opendap URL)

tnoelcke commented 6 years ago

Here is the URL i am using in my code: http://tds-proxy.nkn.uidaho.edu/thredds/dodsC/NWCSC_INTEGRATED_SCENARIOS_ALL_CLIMATE/bcsd-nmme/monthlyForecasts/bcsd_nmme_metdata_ENSMEAN_forecast_1monthAverage.nc

tnoelcke commented 6 years ago

The thing that i don't understand is that it will work for some lat long pairs but not others that are still inside the range that I know is stored at that server.

jswhit commented 6 years ago

Perhaps the server is flaking out at just the moment you are requesting those lat/lon pairs? The error is coming from the C library and not the python interface, so whatever is going on is probably not an issue on the python side.

tnoelcke commented 6 years ago

I think your right i think it must be an issue with the server I'm trying to connect to. Thanks for the help!

tnoelcke commented 6 years ago

After spending some time talking to the system administrator about the problem I was having with this read error on the netcdf4 file we discovered it was because of file chunking that I was getting a read error. I'm not sure if this is due to the Python interface or the C library but we don't get the same read errors in matlab. Additionally, I'm not the only person who has had this issue using the same system according to the system admin. Is there any thing special i need to do when working with chunked files?

Any help or pointers would be much appreciated.

jswhit commented 6 years ago

There's nothing you need to do to read chunked files - it's all handled in the HDF5 library. You can specify the chunksizes when writing, or let the library choose default values. There's not much we can do without a self-contained, reproducible example program that triggers the error you are seeing.

graemerae commented 6 years ago

I'm getting exactly the same error calling GFS data from NCEP. You can use this as the self contained example above. My feeling is that there is a timeout happening or the module cannot maintain the connection. Using a Grads script to call the same dods server for the same data request has no problems connecting and staying connected. I've also tried this on several different and widely separated machines so I'm pretty sure it's not port blocking, or local network related. (PS - using netCDF4 v 1.31)

import netCDF4
import numpy as np

mydate='20180509'

url2='http://nomads.ncep.noaa.gov:9090/dods/gfs_0p50/gfs'+ \
    mydate+'/gfs_0p50_00z'

print url2
#http://nomads.ncep.noaa.gov:9090/dods/gfs_0p50/gfs20180509/gfs_0p50_00z

#OPEN FILE
file2 = netCDF4.Dataset(url2)

#GET VARS  lat/lon/relhumidity
lat  = file2.variables['lat'][:]
lon  = file2.variables['lon'][:]
rh=    file2.variables['rhprs'] 

#LOCATION (SAN DIEGO)
latf=32.75
lonf=242.75

#FIND CLOSEST IDX
lonidx=np.abs(lon - lonf).argmin()
latidx=np.abs(lat - latf).argmin()

print latidx,lonidx
#245 485

print rh.shape
#(81, 47, 361, 720)

#EXTRACT DATA

#WORKS
rhpoint=rh[1,1,latidx,lonidx]
rhpoint=rh[1:5,1:5,latidx,lonidx]
rhpoint=rh[1:15,1:15,latidx,lonidx]
rhpoint=rh[1,:,latidx,lonidx]
rhpoint=rh[1:20,:,latidx,lonidx]
rhpoint=rh[1:30,:,latidx,lonidx]

#FAILS
rhpoint=rh[:,:,latidx,lonidx]
rhpoint=rh[1:50,:,latidx,lonidx]
rhpoint=rh[:,1:10,latidx,lonidx]
rhpoint=rh[:,1:20,latidx,lonidx]
rhpoint=rh[:,1:30,latidx,lonidx]

#FAILS return the following error:
#
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "netCDF4/_netCDF4.pyx", line 3961, in netCDF4._netCDF4.Variable.__getitem__
#   File "netCDF4/_netCDF4.pyx", line 4798, in netCDF4._netCDF4.Variable._get
#   File "netCDF4/_netCDF4.pyx", line 1638, in netCDF4._netCDF4._ensure_nc_success
# RuntimeError: NetCDF: file not found)
dopplershift commented 6 years ago

I've seen errors with both C's ncdump and netCDF-java's Tools-UI when trying to get the rhprs array. Given that some requests work for you and some don't, it seems like the root cause is on the server. netcdf4-python could give a much better error in this case, though.

tnoelcke commented 6 years ago

Yeah I tried doing a similar thing in R using the R netcdf library and discovered that I essentially get the same error.

graemerae commented 6 years ago

My errors seem to be related to the overall size of the requested data set. (a 1x1 or 5x5 array - no problem, but a 30 x 50 array fails. Somewhere in between is the cut off. I'll talk to ncep - see if they have any ideas.

graemerae commented 6 years ago

PS - I also tried setting the .dodsrc timeout variables to something ridiculously large, (eg

HTTP.TIMEOUT=50000
CUROPT_TIMEOUT=10000

but it doesn't look like netcdf4-python honors those settings. (unless I'm missing something)

lesserwhirls commented 6 years ago

@tnoelcke - what is the exact data request you are making to the server (or the exact slice you are using)?

DanielIAvila commented 6 years ago

Have any of you solved the problem? I am experiencing the same issue.

tnoelcke commented 6 years ago

I wasn't able to solve this problem. I ended up setting up a chron job to download the entire file from the server rather than try and read it over the network. I did not have the same issues locally. Time became an issue for my project so I used that method instead.

epifanio commented 4 years ago

I had a similar issue with this dataset

the error arises only when trying to access to certain variables ['HW', 'PW', 'HWA', 'PWA'] - also ncdumps fails, (e,g,: ncdump -vHW https://thredds.met.no/thredds/dodsC/arcticdata/obsSynop/01361) I am clueless on what is the reason for the error, but I don't consider this a problem with xarray.

For now, I hacked my code this way ...

from netCDF4 import Dataset
import xarray as xr

nc_url = "https://thredds.met.no/thredds/dodsC/arcticdata/obsSynop/01361"

nc_fid = Dataset(nc_url, 'r')

for i in nc_fid.variables: 
try:
    nc_fid.variables[i][:] 
    valid_vars.append(i)
except RuntimeError: 
    print('skip:', i) 

ds = xr.open_dataset(nc_url)

df = ds[valid_vars].to_dataframe()
dopplershift commented 4 years ago

@epifanio I get errors with those variables when I use netCDF-java (through ToolsUI) as well. Something is at fault in the server configuration used to aggregate the individual netcdf files together.