ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

`mask_fillvalues` and `mask_multimodel` recompute mask from all input cubes for every output cube #2521

Open bouweandela opened 1 month ago

bouweandela commented 1 month ago

Because mask_fillvalues and mask_multimodel are now lazy, they recompute the mask based on all the input cube for every output cube. This is slow and unnecessary because the mask is the same for every output cube.

valeriupredoi commented 5 days ago

could you maybe tell us more about the process, pls, bud? mask_multimodel calls _multimodel_mask_cubes(cubes, shape) or the equivalent for products, where a composite mask is built from the mask of each cube in cubes so the iteration is needed due to each cube having a different mask - you saying this iteration is done for each cube a la:

for cube in cubes:
    _multimodel_mask_cubes(cubes, shape)  # that will, in turn, loop over cubes again

?

bouweandela commented 5 days ago

Each output file requires a lazy mask that can be computed from all input files, so that means all the input files must be read to save a single output file. Because the output files are currently saved (and computed) one at a time, that means all the input data needs to be read as many times as there are output files. Is that any more clear?

valeriupredoi commented 4 days ago

thanks, bud! I need to read this carefully tomorrow, am just about to go home shove a pizza in the oven :pizza: