NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
127 stars 19 forks source link

Improve `cf.Field.collapse` performance by lazily computing reduced axis coordinates #741

Closed davidhassell closed 8 months ago

davidhassell commented 8 months ago

Currently (v3.16.1), collapsed coordinates are crated non-lazily. E.g in the following the collapsed size 1 coordinate values, and their bounds (latitude, longitude and time) are all computed non-lazily. This can be slow if the original coordinates are on disk, and very slow if they are on disk on a remote server.

>>> print(f)
Field: specific_humidity (ncvar%q)
----------------------------------
Data            : specific_humidity(latitude(5), longitude(8)) 1
Cell methods    : area: mean
Dimension coords: latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : time(1) = [2019-01-01 00:00:00]

>>> print(f.collapse('mean'))
Field: specific_humidity (ncvar%q)
----------------------------------
Data            : specific_humidity(latitude(1), longitude(1)) 1
Cell methods    : area: mean latitude(1): longitude(1): mean
Dimension coords: latitude(1) = [0.0] degrees_north
                : longitude(1) = [180.0] degrees_east
                : time(1) = [2019-01-01 00:00:00]

It would be good compute these values from cached elements, if present, or else do it lazily so that the computation only occurs if the values are ever inspected.