coecms / era5grib

Convert NCI ERA5 archive data to GRIB format
Apache License 2.0
5 stars 1 forks source link

Timeout when using era5grib #9

Open bschroeter opened 2 years ago

bschroeter commented 2 years ago

Hi there, so this has happened to me a few times and I've been struggling to find a workaround.

When using the era5grib utility to acquire data for WRF I end up with the following error.

....
Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/deploy/cluster.py", line 99, in _sync_cluster_info
    await self.scheduler_comm.set_metadata(
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/core.py", line 796, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/core.py", line 753, in live_comm
    comm = await connect(
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-21.07/lib/python3.9/site-packages/distributed/comm/core.py", line 307, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:37637 after 30 s

It seems that the distributed scheduler dies while trying to write out the file and happens with both netcdf and grid output.

Any thoughts?

ScottWales commented 2 years ago

Are you running on the login or a compute node?

bschroeter commented 2 years ago

Compute node, interactively. Heaps of resources.

bschroeter commented 2 years ago

As per slack chat, disabling era5land with --no-era5land appears to make something happen.

ScottWales commented 2 years ago

Appears to be a bug in xesmf's most recent version, which we installed when Conda got updated. I will try downgrading xesmf in Conda to see if that improves performance.

https://github.com/pangeo-data/xESMF/issues/127

ScottWales commented 2 years ago

Appears to now be working for me with your inputs