Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
754 stars 262 forks source link

Error reading in netCDF files from THREDDS server #1011

Closed EliT1626 closed 4 years ago

EliT1626 commented 4 years ago

Error in loading in data from a THREDDS server. Can't find any info on what might be causing it based on the error messages themselves.

Code Sample

def list_dates(start, end):
    num_days = (end - start).days
    return [start + dt.timedelta(days=x) for x in range(num_days)]

start_date = dt.date(2017, 3, 1)
end_date = dt.date(2017, 3, 31)
date_list = list_dates(start_date, end_date)
window = dt.timedelta(days=5)

url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/{0:%Y%m}/avhrr-only-v2.{0:%Y%m%d}.nc'
data = []
cur_date = start_date
for cur_date in date_list:

    date_window = list_dates(cur_date - window, cur_date + window)
    url_list = [url.format(x) for x in date_window]
    window_data=xr.open_mfdataset(url_list).sst
    data.append(window_data.mean('time'))

dataf=xr.concat(data, dim=pd.DatetimeIndex(date_list, name='time'))

Expected Output No error with dataf containing a data array with the dates listed above.

Error Description

KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
    197             try:
--> 198                 file = self._cache[self._key]
    199             except KeyError:

~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in __getitem__(self, key)
     52         with self._lock:
---> 53             value = self._cache[key]
     54             self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-2-2402d81dac52> in <module>
     17     date_window = list_dates(cur_date - window, cur_date + window)
     18     url_list = [url.format(x) for x in date_window]
---> 19     window_data=xr.open_mfdataset(url_list).sst
     20     data.append(window_data.mean('time'))
     21     print(data[-1])

~\Anaconda3\lib\site-packages\xarray\backends\api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, join, attrs_file, **kwargs)
    906         getattr_ = getattr
    907 
--> 908     datasets = [open_(p, **open_kwargs) for p in paths]
    909     file_objs = [getattr_(ds, "_file_obj") for ds in datasets]
    910     if preprocess is not None:

~\Anaconda3\lib\site-packages\xarray\backends\api.py in <listcomp>(.0)
    906         getattr_ = getattr
    907 
--> 908     datasets = [open_(p, **open_kwargs) for p in paths]
    909     file_objs = [getattr_(ds, "_file_obj") for ds in datasets]
    910     if preprocess is not None:

~\Anaconda3\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
    500         if engine == "netcdf4":
    501             store = backends.NetCDF4DataStore.open(
--> 502                 filename_or_obj, group=group, lock=lock, **backend_kwargs
    503             )
    504         elif engine == "scipy":

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    356             netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    357         )
--> 358         return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
    359 
    360     def _acquire(self, needs_lock=True):

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose)
    312         self._group = group
    313         self._mode = mode
--> 314         self.format = self.ds.data_model
    315         self._filename = self.ds.filepath()
    316         self.is_remote = is_remote_uri(self._filename)

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in ds(self)
    365     @property
    366     def ds(self):
--> 367         return self._acquire()
    368 
    369     def open_store_variable(self, name, var):

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock)
    359 
    360     def _acquire(self, needs_lock=True):
--> 361         with self._manager.acquire_context(needs_lock) as root:
    362             ds = _nc4_require_group(root, self._group, self._mode)
    363         return ds

~\Anaconda3\lib\contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock)
    184     def acquire_context(self, needs_lock=True):
    185         """Context manager for acquiring a file."""
--> 186         file, cached = self._acquire_with_cache_info(needs_lock)
    187         try:
    188             yield file

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
    202                     kwargs = kwargs.copy()
    203                     kwargs["mode"] = self._mode
--> 204                 file = self._opener(*self._args, **kwargs)
    205                 if self._mode == "w":
    206                     # ensure file doesn't get overriden when opened again

netCDF4\_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4\_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc'

Versions python: 3.7.4 xarray: 0.15.0 pandas: 0.25.1 numpy: 1.16.5 scipy: 1.3.1 netcdf4: 1.5.3

jswhit commented 4 years ago

Can't reproduce here (by opening the URL directly with netcdf4-python). I wonder why netcdf is trying to write to the remote url? Could this be an xarray issue - perhaps you should open a ticket there as well.

EliT1626 commented 4 years ago

I actually have already done that before coming here and others there have had no issue running the script either. Here is the latest comment I just posted there. Could it be a Windows issue?

After discussing this issue with someone who has a lot more knowledge than me, it seems that it may be pertinent to mention that I am using a Windows machine. He is able to run the script fine on his Linux environment, much like some of you have been able to do. I have tried changing thewindow to different amounts and the script always fails around 25ish calls to the Opendap server. This was done on a new environment with only the required packages installed and updated to the latest versions. Is there some sort of issue with Windows in this regard?

EliT1626 commented 4 years ago

https://github.com/pydata/xarray/issues/4082#issuecomment-639111588

Turns out it has something to do with xarray after all.