ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
221 stars 128 forks source link

Connection timeout message during automatic EGF download #2999

Closed rswamina closed 1 year ago

rswamina commented 1 year ago

I am using the automatic ESGF download option with ESMValTool. I want to download all available ensemble members for a given model (recipe below). I get a message saying TimeoutError: [Errno 110] Connection timed out. I am not sure if this means I should increase the timeout number in the esgf-pyclient.yml file. I'd like some help understanding what I am doing wrong so I can download the neccessary/available data successfully.

I am running ESMValTool on JASMIN using the module load esmvaltool option to run the recipe. I have an interactive logon option set in my esgf-pyclient.yml file. Here are the contents of that file:

logon:
  hostname: "esgf.ceda.ac.uk"
  interactive: true
search_connection:
  urls:
    - 'https://esgf.ceda.ac.uk/esg-search'
    - 'https://esgf-node.llnl.gov/esg-search'
    - 'https://esgf-data.dkrz.de/esg-search'
    - 'https://esgf-node.ipsl.upmc.fr/esg-search'
    - 'https://esg-dn1.nsc.liu.se/esg-search'
    - 'https://esgf.nci.org.au/esg-search'
    - 'https://esgf.nccs.nasa.gov/esg-search'
    - 'https://esgdata.gfdl.noaa.gov/esg-search'
distrib: true
timeout: 600  # seconds
cache: '~/.esmvaltool/cache/pyesgf-search-results'
expire_after: 86400  # cache expires after 1 day

My recipe is :

# ESMValTool
# recipe_download_esgf_data.yml
---
documentation:
  description: |
    This is a recipe to download data sets from ESGF nodes.

  authors:
    - swaminathan_ranjini

  title: |

    Recipe to download data from ESGF nodes.

  maintainer:
    - swaminathan_ranjini

datasets:

  - {dataset: MPI-ESM1-2-LR, project: CMIP6, exp: historical, ensemble: r[1:10]i1p1f1, start_year: 1995, end_year: 2014, grid: gn}
  - {dataset: MPI-ESM1-2-LR, project: CMIP6, exp: ssp370, ensemble: r[1:10]1i1p1f1, start_year: 2081, end_year: 2100, grid: gn}

preprocessors:
  preproc_extract_region_land:
    extract_shape:
      shapefile : IPCC-AR6-shapefiles/IPCC-WGI-reference-regions-v4.shp
      decomposed : False
      method : contains
      crop: True
      ids: 
        - 'S.Asia'
    mask_landsea:
      mask_out : sea

diagnostics:
  day_pr:
    description: extract region
    variables:
      pr:
        preprocessor: preproc_extract_region_land
        project: CMIP6
        mip: day
    scripts: null

And finally this is the content of the file ~/.esmvaltool/cache/esgf-hosts.yml:

esgf-node2.cmcc.it:
  duration (s): 2
  error: false
  size (bytes): 3098826
  speed (MB/s): 1.9
esgf3.dkrz.de:
  duration (s): 0
  error: false
  size (bytes): 1812797
  speed (MB/s): 5.0
valeriupredoi commented 1 year ago

@rswamina are you able to download anything at all or this is what you get at all times? @bouweandela is the specialist no 1 when it comes to ESGF download issues, so am CC-ing him in here; also, depending on what the troubleshooting leads to, I may have to move this issue to ESMValCore :beer: PS: note that JASMIN has had some pretty nasty issues in the past 24 hours

valeriupredoi commented 1 year ago

@rswamina are you still seeing the reported behaviour? :beer:

bouweandela commented 1 year ago

Hi @rswamina,

Searching for files and downloading them are two separate steps that make use of separate servers. 1) Searching for files is done using the index servers (or nodes as they are known on ESGF), those are listed in the configuration file you posted. This step makes use of the timeout listed there. If you're having issues with the search, you can move another server to the top of the list, as ESMValCore will use the first index node that is online from that list. 2) The next step is downloading, this is done from the servers where the data is stored. This can be one or more servers, depending on where the data is hosted. The URLs are retrieved as part of the search. If you prefer not to use a particular host for downloading, you can set duration to a very large number for that server in the file ~/.esmvaltool/cache/esgf-hosts.yml. ESMValCore keeps track of how fast servers are and will prefer faster ones, so if you set duration to a high number it will think that server is slow and only use that one as a last resort. Note that this file is automatically updated, so your changes might get overwritten at some point.

Could you post the full stack trace to see which URL is giving you the timeout error? You can ask about the status of servers at the ESGF user mailing list.

rswamina commented 1 year ago

Thanks. I will see if I can get that information out.

rswamina commented 1 year ago

Hi @bouweandela - On further investigation of the cache file, I noticed this was an older file from a previous run. Per instruction in the Readthedocs page I deleted it and find that a new one is not created now. I wonder if that means that connection was not made successsfully. This is the stack trace:

2023-01-21 19:32:11,049 UTC [11715] ERROR   Program terminated abnormally, see stack trace below for more information:
Traceback (most recent call last):
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/esmvalcore/_main.py", line 499, in run
    fire.Fire(ESMValTool())
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/esmvalcore/_main.py", line 428, in run
    logon()
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/esmvalcore/esgf/_logon.py", line 28, in logon
    manager.logon(**cfg['logon'])
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/pyesgf/logon.py", line 184, in logon
    creds = c.logon(username, password,
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/myproxy/client/__init__.py", line 1453, in logon
    self.getTrustRoots(writeToCACertDir=True,
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/myproxy/client/__init__.py", line 1609, in getTrustRoots
    conn.connect((self.hostname, self.port))
  File "/apps/jasmin/community/esmvaltool/miniconda3/envs/esmvaltool/lib/python3.10/site-packages/OpenSSL/SSL.py", line 2022, in connect
    return self._socket.connect(addr)
TimeoutError: [Errno 110] Connection timed out
2023-01-21 19:32:11,057 UTC [11715] INFO    
If you have a question or need help, please start a new discussion on https://github.com/ESMValGroup/ESMValTool/discussions
If you suspect this is a bug, please open an issue on https://github.com/ESMValGroup/ESMValTool/issues
To make it easier to find out what the problem is, please consider attaching the files run/recipe_*.yml and run/main_log_debug.txt from the output directory.
bouweandela commented 1 year ago

It looks like you're not able to connect to the authentication server. This is the ESGF server where you log in with your OpenID. Can you try with the command line utility from the myproxyclient package? https://pypi.org/project/MyProxyClient/. If this fails, you could also try asking at the ESGF user mailinglist, they should know what authentication servers are online.

As a workaround: are you trying to download CORDEX data? As far as I'm aware, you only need to log on to ESGF for downloading CORDEX data, so if you're not interested in that you could also comment out the logon section in your ~/.esmvaltool/esgf-pyclient.yml configuration file.

rswamina commented 1 year ago

Hi @bouweandela - I did not need the logon option for downloading CMIP6 data. However when doing an ESGF download, I could not use the generic option to download multiple ensemble members. Am I correct in understanding that I cannot provide an option like r[1:10] for ESGF downloads?

bouweandela commented 1 year ago

The syntax for specifying multiple ensemble members in a recipe is r(1:10) (note the different brackets). Does it work if you use that?

rswamina commented 1 year ago

No it does not. Here is how I specify my datasets:

  - {dataset: MPI-ESM1-2-HR, project: CMIP6, exp: historical, ensemble: "(r1:10)i1p1f1", start_year: 1995, end_year: 2014, grid: gn}

  - {dataset: MPI-ESM1-2-HR, project: CMIP6, exp: ssp370, ensemble: "(r1:10)i1p1f1", start_year: 2081, end_year: 2100, grid: gn}

The log message includes something like :

2023-01-26 14:25:11,244 UTC [32380] DEBUG   Starting new HTTPS connection (1): esgf.ceda.ac.uk:443
2023-01-26 14:25:11,290 UTC [32380] DEBUG   https://esgf.ceda.ac.uk:443 "GET /esg-search/search?format=application%2Fsolr%2Bjson&limit=500&distrib=true&offset=0&type=File&latest=True&project=CMIP6&source_id=MPI-ESM1-2-HR&variant_label=%28r1%3A10%29i1p1f1&experiment_id=ssp370&grid_label=gn&table_id=fx&variable=sftlf HTTP/1.1" 200 1218
2023-01-26 14:25:11,291 UTC [32380] DEBUG   Found the following files matching facets {'project': 'CMIP6', 'source_id': 'MPI-ESM1-2-HR', 'variant_label': '(r1:10)i1p1f1', 'experiment_id': 'ssp370', 'grid_label': 'gn', 'table_id': 'fx', 'variable': 'sftlf'}: none
2023-01-26 14:25:11,291 UTC [32380] WARNING Missing data for fx variable 'sftlf' of dataset ScenarioMIP
2023-01-26 14:25:11,291 UTC [32380] DEBUG   For fx variable 'sftof', found table 'Ofx'
2023-01-26 14:25:11,291 UTC [32380] DEBUG   Looking for files matching ['sftof_Ofx_MPI-ESM1-2-HR_ssp370_(r1:10)i1p1f1_gn*.nc'] in []

It seems like the files are not being retrieved. I do know that individual files exist and can be retrieved. For instance if I try just to get ensemble member r2i1p1f1 in the above case that works for the historical and ssp370experiments.

bouweandela commented 1 year ago

The letter r needs to be outside the brackets. Can you try again with datasets

  - {dataset: MPI-ESM1-2-HR, project: CMIP6, exp: historical, ensemble: "r(1:10)i1p1f1", start_year: 1995, end_year: 2014, grid: gn} in
  - {dataset: MPI-ESM1-2-HR, project: CMIP6, exp: ssp370, ensemble: "r(1:10)i1p1f1", start_year: 2081, end_year: 2100, grid: gn}
rswamina commented 1 year ago

This worked. Thanks @bouweandela . I was staring so hard at the screen that I did not see the r was outside the quotes. Sorry. At this time, I am able to connect and download data from ESGF for CMIP6 data. If I have other problems, I will open a different issue. Happy to close this now.