bird-house / finch

A Web Processing Service for Climate Indicators
https://finch.readthedocs.io/en/latest/
Apache License 2.0
12 stars 5 forks source link

average_polygon extremely slow when running on THREDDS data #178

Closed huard closed 2 years ago

huard commented 3 years ago

Description

Defining a 500 km² watershed and then trying to compute ERA5 average on it (tolerance=.1) is extremely slow (one month of data). Never had the patience to let it finish.

Zeitsperre commented 3 years ago

Related PR: https://github.com/Ouranosinc/pavics-sdi/pull/221

Running finch.subset using our THREDDS-housed data and a PAVICS-based finch is incredibly slow. Using a locally running instance of finch, the only bottleneck is my internet connection when collecting the data, but otherwise it all runs very fast. My feeling is it might be a network issue.

tlvu commented 3 years ago

Running finch.subset using our THREDDS-housed data and a PAVICS-based finch is incredibly slow.

FYI, we currently have a slow Twitcher/Magpie problem (https://github.com/bird-house/twitcher/issues/97) and a low I/O problem (https://github.com/bird-house/birdhouse-deploy/pull/122#issuecomment-773477708) on PAVICS in prod.

huard commented 2 years ago

This has possibly many explanations.

  1. The dot product between the weights and the field requires the entire field, so we need to load the entire dataset in memory (or stream it). A PR is in progress in clisops to apply a subset first.
  2. The weights are copied for each dask chunk, there is partial fix in xESMF 0.6.1, but issues remain.
huard commented 2 years ago

This should hopefully be improved with the new clisops release.