ESGF / esgf-pyclient

Search client for the ESGF Search API
https://esgf-pyclient.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
33 stars 18 forks source link

Accessing opendap datasets #26

Open jhamman opened 6 years ago

jhamman commented 6 years ago

I am working on what I think is a fairly common workflow:

  1. log on to ESGS using the LogonManager class
  2. search for some datasets using the SearchConnection class
  3. access some opendap dataset using netcdf4-python or pydap

Here's an example workflow:

In [1]: openid = 'https://esgf-node.llnl.gov/esgf-idp/openid/SECRET'
   ...: password = 'SECRET'
   ...:

In [2]: from pyesgf.logon import LogonManager
   ...: from pyesgf.search import SearchConnection
   ...: import xarray as xr
   ...:

In [3]: # intialize the logon manager
   ...: lm = LogonManager(verify=True)
   ...: if not lm.is_logged_on():
   ...:     lm.logon_with_openid(openid, password, 'pcmdi9.llnl.gov')
   ...: lm.is_logged_on()
   ...:
Out[3]: True

In [4]: def print_context_info(ctx):
   ...:     print('Hits:', ctx.hit_count)
   ...:     print('Realms:', ctx.facet_counts['experiment'])
   ...:     print('Realms:', ctx.facet_counts['realm'])
   ...:     print('Ensembles:', ctx.facet_counts['ensemble'])
   ...:

In [5]: # search for some data
   ...: conn = SearchConnection('http://pcmdi9.llnl.gov/esg-search', distrib=Tru
   ...: e)
   ...: ctx = conn.new_context(project='CMIP5', model='CCSM4', experiment='rcp85
   ...: ', time_frequency='day')
   ...: ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
   ...:
   ...: # print a summary of what we found
   ...: print_context_info(ctx)
   ...:
Hits: 4
Realms: {'rcp85': 4}
Realms: {'atmos': 4}
Ensembles: {'r1i1p1': 4}

In [6]: # aggregate results
   ...: result = ctx.search()[0]
   ...: agg_ctx = result.aggregation_context()
   ...:
   ...: # get a list of opendap urls
   ...: x = list(a.opendap_url for a in agg_ctx.search() if a.opendap_url)
   ...: x
   ...:
Out[6]:
['http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmin.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmax.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.prc.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.psl.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tas.20120705.aggregation.1',
 'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.pr.20120705.aggregation.1']

In [7]: # try opening one of the opendap datasets
   ...: xr.open_dataset(x[0], engine='pydap')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-90d39efb83f7> in <module>()
      1 # try opening one of the opendap datasets
----> 2 xr.open_dataset(x[0], engine='pydap')

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    302                                             autoclose=autoclose)
    303         elif engine == 'pydap':
--> 304             store = backends.PydapDataStore.open(filename_or_obj)
    305         elif engine == 'h5netcdf':
    306             store = backends.H5NetCDFStore(filename_or_obj, group=group,

~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
     75     def open(cls, url, session=None):
     76         import pydap.client
---> 77         ds = pydap.client.open_url(url, session=session)
     78         return cls(ds)
     79

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/client.py in open_url(url, application, session, output_grid)
     62     never retrieve coordinate axes.
     63     """
---> 64     dataset = DAPHandler(url, application, session, output_grid).dataset
     65
     66     # attach server-side functions

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid)
     62
     63         # build the dataset from the DDS and add attributes from the DAS
---> 64         self.dataset = build_dataset(dds)
     65         add_attributes(self.dataset, parse_das(das))
     66

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in build_dataset(dds)
    159 def build_dataset(dds):
    160     """Return a dataset object from a DDS representation."""
--> 161     return DDSParser(dds).parse()
    162
    163

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in parse(self)
     47         dataset = DatasetType('nameless')
     48
---> 49         self.consume('dataset')
     50         self.consume('{')
     51         while not self.peek('}'):

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in consume(self, regexp)
     39     def consume(self, regexp):
     40         """Consume and return a token."""
---> 41         token = super(DDSParser, self).consume(regexp)
     42         self.buffer = self.buffer.lstrip()
     43         return token

~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/__init__.py in consume(self, regexp)
    180             self.buffer = self.buffer[len(token):]
    181         else:
--> 182             raise Exception("Unable to parse token: %s" % self.buffer[:10])
    183         return token

Exception: Unable to parse token:

Questions:

  1. Is this actually a workflow that should work?
  2. Does this opendap URL actually exist? What is the best way to test that an opendap url from esgf is a valid one?
  3. Is additional authentication required?
agstephens commented 6 years ago

Hi @jhamman, right now I don't have time to look into your issue but please see if this example sheds any light on your questions: https://github.com/cehbrecht/demo-notebooks/blob/master/esgf-opendap.ipynb

jhamman commented 6 years ago

@agstephens - Indeed, I had seen this notebook. As far as I can tell, the problem seems to lie in the use of aggregation context urls to opendap datasets.

cehbrecht commented 6 years ago

@jhamman late answer ... there might be several issues but not related to esgf-pyclient. The aggregation might not work but it also looks like that pydap needs to be updated to work with ESGF.

I tried it with a CORDEX aggregation and I can't get pydap working: https://github.com/cehbrecht/jupyterlab-notebooks/blob/master/esgf-examples/esgf-pydap.ipynb

See also: https://pydap.readthedocs.io/en/latest/client.html?#earth-system-grid-federation-esgf

saeedvzf commented 4 years ago

Hi, I hope you all are doing well,

Can anyone help me to overcome this issue?

My OpenID is working and is connected to my ESGF acc.

Please let me know if you need more information.

Thank you,

Saeed

from pyesgf.search import SearchConnection conn = SearchConnection('https://esgf-index1.ceda.ac.uk/esg-search/',distrib=True) ctx = conn.new_context(project= 'CORDEX', institute= 'KNMI', time_frequency= 'day', experiment= 'historical', variable= 'tas') ctx.hit_count result = ctx.search()[14] result.dataset_id ds = ctx.search()[14] files = ds.file_context().search() len(files) for f in files: print(f.download_url); from pyesgf.logon import LogonManager lm = LogonManager() lm.logoff() lm.is_logged_on() OPENID = 'https://ceda.ac.uk/openid/xxx' lm.logon_with_openid(openid=OPENID, password=None, bootstrap=True) lm.is_logged_on() password = 'xxx' username = 'xxx' myproxy_host = 'slcs1.ceda.ac.uk' lm.logon(username, password, hostname=myproxy_host, interactive=True, bootstrap=True) lm.is_logged_on() import xarray as xr ds = xr.open_dataset(f.download_url) print(ds) `KeyError Traceback (most recent call last) D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError:

D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)

in ----> 1 ds = xr.open_dataset(f.download_url) 2 print(ds) D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta) 506 engine = _get_default_engine(filename_or_obj, allow_remote=True) 507 if engine == "netcdf4": --> 508 store = backends.NetCDF4DataStore.open( 509 filename_or_obj, group=group, lock=lock, **backend_kwargs 510 ) D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 356 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 357 ) --> 358 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) 359 360 def _acquire(self, needs_lock=True): D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose) 312 self._group = group 313 self._mode = mode --> 314 self.format = self.ds.data_model 315 self._filename = self.ds.filepath() 316 self.is_remote = is_remote_uri(self._filename) D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in ds(self) 365 @property 366 def ds(self): --> 367 return self._acquire() 368 369 def open_store_variable(self, name, var): D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds D:\Anaconda\envs\gdal\lib\contextlib.py in __enter__(self) 111 del self.args, self.kwds, self.func 112 try: --> 113 return next(self.gen) 114 except StopIteration: 115 raise RuntimeError("generator didn't yield") from None D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(*self._args, **kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again netCDF4\_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__() netCDF4\_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -78] NetCDF: Authorization failure: b'http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc'`
larsbuntemeyer commented 4 years ago

@saeedvzf I have similar problems. It's probably because you have project= 'CORDEX'. You need special authorization to access that data via open_dap using the CORDEX project_id. I see that you have logged on. So you log on to one of the webportals of ESGF data nodes and check if you are part of the cordex project in your profile. If not, you can simply click something like join Cordex project in the top.