SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

Bugfixes + changes wanted in "combine_regions" #4882

Open pp-mo opened 2 years ago

pp-mo commented 2 years ago

As identified in work on #4845

Follow-on fixes/changes :

  1. the call to "fix_dask_settings" in the setup_cache call is not necessarily affecting what happens in the main tests, as we had thought.
    • The "setup" call should be doing that first, instead of last, so that it can affect how test cubes are saved/reloaded.
  2. the basic "combine_regions" operation ought to be enforcing full-dim chunks in the mesh dimension (or it may not work). Never tested yet as mesh-dim < default chunksize. This can be fixed by rechunking (this was tested).
  3. we should ensure that the memory cost of this operation does not increase with scaling on an outer dimension -- i.e. it can repeat over timesteps et al without a further cost multiplier
pp-mo commented 2 years ago

Re: (2) : enforcing full-mesh-dim chunks... Suggestion captured: Following creation of result_array, with mesh in the last dim, add lines ...

    # rechunk result if needed, to ensure that mesh dim is NOT split
    if result_array.chunksize[-1] != result_array.shape[-1]:
        chunks = list(result_array.chunksize)
        chunks[-1] = result_array.shape[-1]
        result_array = result_array.rechunk(chunks)