Timh37 / CMIP6cex

Repository for the cloud-based analysis of changes in compound extremes in CMIP6 simulations.
MIT License
5 stars 2 forks source link

Speed up CMIP6 processing #10

Open Timh37 opened 1 year ago

Timh37 commented 1 year ago

As discussed, @jbusecke, while it doesn't have a high priority at the moment, it'd be good to see if the main infrastructure of processing the CMIP6 data and deriving changes in extremes from the simulations can be made more efficient. This may be a good place to start.

Timh37 commented 1 year ago

@jbusecke I am revisting the old workflow here. Would be great if we could briefly discuss if/how this could be sped up in light of the many new datasets, during our meeting tomorrow!

jbusecke commented 1 year ago

Do you think this is solved by my work in https://github.com/jbusecke/CMIP6cex/tree/jbusecke_performance_regridding? Or should I make a PR for that? FYI, I also mentioned this in a pangeo discourse topic recently. Hopefully for now we get a hacky yet sufficiently performant solution, but maybe in the future there is a more satisfying way to handle this in general.

Timh37 commented 1 year ago

@jbusecke yes and no, I incorporated your work in https://github.com/Timh37/CMIP6cex/blob/main/cmip6_processing/testing/store_CMIP6_regridded_datasets.ipynb which works like a charm. However, I am struggling to edit the workflow to regrid to tide gauges: https://github.com/Timh37/CMIP6cex/blob/main/cmip6_processing/testing/store_CMIP6_datasets_at_tgs.ipynb. I tried both using fancy indexing and regridding to each tide gauge in a loop, but both are slow (and failing to connect to the rescheduler at some point?). If you have a chance to take a look that would be appreciated!

Timh37 commented 1 year ago

Looks like this is solved by reducing the batch size! Will confirm when I know more.

jbusecke commented 1 year ago

Sounds good. We can put in some more effort here after mid Nov if needed.

Timh37 commented 1 year ago

I keep getting errors along these lines:

/srv/conda/envs/notebook/lib/python3.10/site-packages/distributed/client.py:3141: UserWarning: Sending large graph of size 36.53 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
/srv/conda/envs/notebook/lib/python3.10/site-packages/distributed/client.py:3141: UserWarning: Sending large graph of size 33.03 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 1367, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/srv/conda/envs/notebook/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1007)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 192, in _handle_events
    handler_func(fileobj, events)
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 691, in _handle_events
    self._handle_read()
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 1454, in _handle_read
    self._do_ssl_handshake()
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 1385, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 606, in close
    self._signal_closed()
  File "/srv/conda/envs/notebook/lib/python3.10/site-packages/tornado/iostream.py", line 636, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError
2023-10-24 22:17:33,777 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client
/srv/conda/envs/notebook/lib/python3.10/site-packages/distributed/client.py:3141: UserWarning: Sending large graph of size 42.76 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(

I think I will hold off from updating the workflow with the speed improvements until we can figure this out after mid November.