ESGF / esgf-pyclient

Search client for the ESGF Search API
https://esgf-pyclient.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
32 stars 18 forks source link

cordex data access #69

Closed larsbuntemeyer closed 9 months ago

larsbuntemeyer commented 3 years ago

Hi, the esgf-pyclient really works great for me on CMIP5 and CMIP6 data. However, I have some problems accessing CORDEX data. I have CORDEX_Research data access rights and can successfully login using the pyclient:

import netCDF4 as nc4
from pyesgf.logon import LogonManager
from pyesgf.search import SearchConnection
import pyesgf

print(nc4.__version__)
print(pyesgf.__version__)

lm = LogonManager()

myproxy_host = 'esgf-data.dkrz.de'
lm.logon(hostname=myproxy_host, interactive=True, bootstrap=True)
lm.is_logged_on()
1.5.3
0.3.0
Enter myproxy username: 

 g300046
Enter password for g300046:  ········

True
# search CORDEX project for REMO2015 fx orog variables
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='CORDEX', experiment='evaluation', time_frequency='fx', rcm_name='REMO2015', variable='orog')
result = ctx.search()

orog_url = {}

# loop through search results of datasets
for res in result:
    ctx = res.file_context()
    domain = list(ctx.facet_counts['domain'].keys())[0]
    print('domain: {}'.format(domain))
    # the dataset should contains only one files for fx variables
    dataset = ctx.search()
    filename = dataset[0].opendap_url
    print('filename: {}'.format(filename))
    orog_url[domain] = filename
domain: EUR-11
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc
domain: SAM-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/SAM-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_SAM-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: AFR-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/AFR-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_AFR-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: CAM-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/CAM-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_CAM-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: EAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_EAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: EUR-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_EUR-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: SEA-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/SEA-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_SEA-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: WAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/WAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_WAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: AUS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/AUS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_AUS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: CAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/CAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_CAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
orog_url.keys()
dict_keys(['EUR-11', 'SAM-22', 'AFR-22', 'CAM-22', 'EAS-22', 'EUR-22', 'SEA-22', 'WAS-22', 'AUS-22', 'CAS-22'])
url = orog_url['EUR-11']
url
'http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc'

This works all fine until I actually want to access the data:

# netcdf4 engine
ds = nc4.Dataset(url)
---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)

<ipython-input-8-fbb4748a9677> in <module>()
      1 # netcdf4 engine
----> 2 ds = nc4.Dataset(url)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -68] NetCDF: I/O failure: b'http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc'

With CMIP5 data everything works fine, e.g,:

# check with CMIP5 data, this works fine.
url = "http://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/historical/fx/atmos/fx/r0i0p0/v20120315/orog/orog_fx_MPI-ESM-LR_historical_r0i0p0.nc"
ds = nc4.Dataset(url)
ds
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format DAP2):
    institution: Max Planck Institute for Meteorology
    institute_id: MPI-M
    experiment_id: historical
    source: MPI-ESM-LR 2011; URL: http://svn.zmaw.de/svn/cosmos/branches/releases/mpi-esm-cmip5/src/mod; atmosphere: ECHAM6 (REV: 4603), T63L47; land: JSBACH (REV: 4603); ocean: MPIOM (REV: 4603), GR15L40; sea ice: 4603; marine bgc: HAMOCC (REV: 4603);
    model_id: MPI-ESM-LR
    forcing: GHG,Oz,SD,Sl,Vl,LU
    parent_experiment_id: piControl
    parent_experiment_rip: r1i1p1
    branch_time: 10957.0
    contact: cmip5-mpi-esm@dkrz.de
    history: Model raw output postprocessing with modelling environment (IMDI) at DKRZ: URL: http://svn-mad.zmaw.de/svn/mad/Model/IMDI/trunk, REV: 4201 2012-01-13T07:51:03Z CMOR rewrote data to comply with CF standards and CMIP5 requirements.
    references: ECHAM6: n/a; JSBACH: Raddatz et al., 2007. Will the tropical land biosphere dominate the climate-carbon cycle feedback during the twenty first century? Climate Dynamics, 29, 565-574, doi 10.1007/s00382-007-0247-8;  MPIOM: Marsland et al., 2003. The Max-Planck-Institute global ocean/sea ice model with orthogonal curvilinear coordinates. Ocean Modelling, 5, 91-127;  HAMOCC: Technical Documentation, http://www.mpimet.mpg.de/fileadmin/models/MPIOM/HAMOCC5.1_TECHNICAL_REPORT.pdf;
    initialization_method: 0
    physics_version: 0
    tracking_id: d9bbcbd4-c852-4bd0-a3b4-0fccb598f23c
    product: output
    experiment: historical
    frequency: fx
    creation_date: 2012-01-13T07:51:03Z
    Conventions: CF-1.4
    project_id: CMIP5
    table_id: Table fx (26 July 2011) 491518982c8d8b607a58ba740689ea09
    title: MPI-ESM-LR model output prepared for CMIP5 historical
    parent_experiment: pre-industrial control
    modeling_realm: atmos
    realization: 0
    cmor_version: 2.6.0
    dimensions(sizes): bnds(2), lat(96), lon(192)
    variables(dimensions): float64 lat(lat), float64 lat_bnds(lat,bnds), float64 lon(lon), float64 lon_bnds(lon,bnds), float32 orog(lat,lon)
    groups: 

I know, that this is no esgf-pyclient issue but I wonder how the logon would work. I suspect it's a problem with me logging onto ESGF via python (I can logon also on the web interface of ESGF and download CORDEX data without a probem). It would be really nice for me to have access to the opendap urls via python, too. Thanks a lot!

bouweandela commented 2 years ago

Maybe this article could be helpful? https://help.ceda.ac.uk/article/4712-reading-netcdf-with-python-opendap