NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
119 stars 19 forks source link

Aggregate crashes when fields contain the actual_range attribute #764

Closed Mark-D-Smith closed 4 months ago

Mark-D-Smith commented 4 months ago

I have been trying to load some monthly files containing air temperature data (from chess-met https://catalogue.ceh.ac.uk/documents/835a50df-e74f-4bfb-b593-804fd61d5eab; although the data themselves are not important for this error) using cf.load, with the intention of aggregating along the time axis. I get a crash from cf.aggregate when the files contain the actual_rangeattribute AND there are at least three files to aggregate.

This occurs because unlike some other attributes actual_rangeis not handled specially by cf.aggregate. The default behviour is to convert both to strings and append one to the other: https://github.com/NCAS-CMS/cf-python/blob/7d78eacf31202b7522e93b64e01fc20b0b50827e/cf/aggregate.py#L4871-L4881 When the third file is appended, the string comparison is performed between a two-element np.ndarray from the new file and a string from the previous aggregation https://github.com/NCAS-CMS/cf-python/blob/7d78eacf31202b7522e93b64e01fc20b0b50827e/cf/aggregate.py#L4864-L4869. This raises an exception as an if statement requires a scalar logical (i.e. any or all).

Since the valid_range attribute is defined in the CF-standard (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s05.html.) this should be handled. A simple correction would be to check if the property being updated is actual_range, and if so to set the parent's corresponding attribute to the minima and maxima of the two aggregated datasets. For instance:

        if prop in ("actual_range"):
            parent0.set_property(prop,[min(value0[0],value1[0]),max(value0[1],value1[1])])
            continue 

Alternatively it could be fixed by ignoring this attribute, as is currently done for valid_range & similar attributes.

Mark-D-Smith commented 4 months ago

The result of cf.environment:

Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
HDF5 library: 1.14.3
netcdf library: 4.9.2
udunits2 library: /home/users/marrho/miniconda3/envs/py3_unifhy/lib/libudunits2.so.0
esmpy/ESMF: 8.6.0
Python: 3.12.2
dask: 2024.4.0
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 24.0
numpy: 1.26.4
scipy: 1.12.0
matplotlib: not available
cftime: 1.6.2
cfunits: 3.3.6
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.1
davidhassell commented 4 months ago

Hi Mark,

Thanks for the report. I'm having a look, but it'd be useful to know if you provided any keywords to configure the aggregation process (such as respect_valid, or any others)?

When you say "actual_range", do you perhaps mean "valid_range", or is there something else going on?

I'll run some tests ...

davidhassell commented 4 months ago

Hi Mark,

I agree that https://github.com/NCAS-CMS/cf-python/blob/7d78eacf31202b7522e93b64e01fc20b0b50827e/cf/aggregate.py#L4864-L4869 needs fixing up for the case that the property values are numpy arrays - thanks for spotting this.

I see that you did mean actual_range (sorry - I (mis)read a little hastily!). Your suggestion sounds reasonable, and I'll put together a PR.

Mark-D-Smith commented 4 months ago

Thanks David, just in case its still useful - I didn't set any other keywords, so default behaviour was to ignore the valid_range, valid_minand valid_max keywords. Its just actual range that threw the error, but as you rightly observe this could happen for any hypothetical keyword that is a numpy array so a more general fix might be preferable.

davidhassell commented 4 months ago

Hi Mark - PR #765 created. With a following wind it should make the public release we're planning for Thursday/Friday this week.

davidhassell commented 4 months ago

Hi Mark,

This change will go into v3.16.2, which will be publicly released tomorrow (2024-04-26) - many thanks for your input.

David