SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
630 stars 283 forks source link

cube.collapsed fails with multi-dimensional string coordinates #3653

Closed PAGWatson closed 1 month ago

PAGWatson commented 4 years ago

Hi, I have a cube like that below, with with data from multi-ensemble member climate model runs covering different years. The 'Expt ID' coordinate contains the run ID corresponding to each ensemble member for each year. I get an error when I do cube.collapsed('year', iris.analysis.MEAN).

I did previously collapse the cube over a 'season' coordinate (since removed), where each season had three time values, so perhaps this issue only arises when an entire dimension is collapsed?

print cube
air_temperature / (K)               (time: 30; Ens member: 15; latitude: 145; longitude: 192)
     Dimension coordinates:
          time                           x              -             -               -
          Ens member                     -              x             -               -
          latitude                       -              -             x               -
          longitude                      -              -             -               x
     Auxiliary coordinates:
          season_year                    x              -             -               -
          year                           x              -             -               -
          Expt ID                        x              x             -               -

cube_mean=cube.collapsed('year',iris.analysis.MEAN)

The full error message is given below. It seems that the problem is that iris tries joining the strings in 'Expt ID' into one long string, and then finds that this does not have the same size as the 'Ens member' dimension.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-43764f0e4a7d> in <module>()
----> 1 cube_seasmean.collapsed('year',iris.analysis.MEAN)

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in collapsed(self, coords, aggregator, **kwargs)
   3253                 local_dims = [coord_dims.index(dim) for dim in
   3254                               dims_to_collapse if dim in coord_dims]
-> 3255                 collapsed_cube.replace_coord(coord.collapsed(local_dims))
   3256 
   3257         untouched_dims = sorted(untouched_dims)

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in replace_coord(self, new_coord)
   1181             self.add_dim_coord(new_coord, dims[0])
   1182         else:
-> 1183             self.add_aux_coord(new_coord, dims)
   1184 
   1185         for factory in self.aux_factories:

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in add_aux_coord(self, coord, data_dims)
    964         if self.coords(coord):  # TODO: just fail on duplicate object
    965             raise ValueError('Duplicate coordinates are not permitted.')
--> 966         self._add_unique_aux_coord(coord, data_dims)
    967 
    968     def _check_multi_dim_metadata(self, metadata, data_dims):

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _add_unique_aux_coord(self, coord, data_dims)
    996 
    997     def _add_unique_aux_coord(self, coord, data_dims):
--> 998         data_dims = self._check_multi_dim_metadata(coord, data_dims)
    999         self._aux_coords_and_dims.append([coord, data_dims])
   1000 

/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _check_multi_dim_metadata(self, metadata, data_dims)
    988                     raise ValueError(msg.format(dim, self.shape[dim],
    989                                                 metadata.name(), i,
--> 990                                                 metadata.shape[i]))
    991         elif metadata.shape != (1,):
    992             msg = 'Missing data dimensions for multi-valued {} {!r}'

ValueError: Unequal lengths. Cube dimension 0 => 15; metadata 'Expt ID' dimension 0 => 1.

A quick way of coming up with a solution is just to explicitly make a string that is the join of the individual strings, removing the string coord and re-adding the joined string as a scalar coord (I'm not good enough with iris to know if this would be very robust, but it seems to work for my case).

for coord in cube.aux_coords:
    if coord.ndim>1 and coord.dtype.char=='S':
        new_str='|'.join(coord.points.ravel())
        new_coord=iris.coords.AuxCoord(new_str, attributes=coord.attributes, long_name=coord.long_name, standard_name=coord.standard_name, units=coord.units, var_name=coord.var_name)
        cube.remove_coord(coord)
        cube.add_aux_coord(new_coord)

cube_mean=cube_.collapsed('year', iris.analysis.MEAN)  #now this works

It would be nice if iris would do something like this in cube.collapsed(). Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.

rcomer commented 4 years ago

Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.

The aggregated_by method has string handling that does that. So I would say it’s desirable to have consistent behaviour in collapsed. I also think it should be relatively simple to implement.

rcomer commented 4 years ago

The relevant handling for aggregated_by looks like this:

https://github.com/SciTools/iris/blob/c9506e6a41282e91a27101905cc4e9d3cb866e4b/lib/iris/analysis/__init__.py#L2182-L2217

github-actions[bot] commented 3 years ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

rcomer commented 3 years ago

I believe this bug is very fixable, it just needs someone to find the time. So I say we leave this issue open.

rcomer commented 1 year ago

I have proposed a fix for this at #4294.

github-actions[bot] commented 7 months ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

rcomer commented 7 months ago

I still think we should make this work but if we don't make it work we should at least raise a more decipherable error message.