Cube slice operation changes the data array type

SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data

https://scitools-iris.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

633 stars 283 forks source link

Cube slice operation changes the data array type #5318

Open bouweandela opened 1 year ago

bouweandela commented 1 year ago

🐛 Bug Report

How To Reproduce

Steps to reproduce the behaviour:

If a cube with masked data is sliced with a single index, the resulting cube does not have masked data in some cases:

>>> import iris.cube
>>> import numpy as np
>>> type(iris.cube.Cube(np.ma.array([1.], mask=[0]))[0].data)
<class 'numpy.ndarray'>

Expected behaviour

I would expect slicing not to change the data type of the array, i.e. I would expect the code above to print <class 'numpy.ma.core.MaskedArray'>. If the data is masked, things do work as expected:

>>> type(iris.cube.Cube(np.ma.array([1.], mask=[1]))[0].data)
<class 'numpy.ma.core.MaskedArray'>

Environment

OS & Version: Ubuntu 23.04
Iris Version: 3.5

rcomer commented 1 year ago

I think this boils down to

import iris.cube
import numpy as np

array = np.ma.array([1.], mask=[0])
arr_slice = array[0]

print(type(arr_slice))

cube = iris.cube.Cube(arr_slice)

print(type(cube.data))

<class 'numpy.float64'>
<class 'numpy.ndarray'>

Because Iris basically slices the data and then wraps a new cube around it.

https://github.com/SciTools/iris/blob/89e67ed48c8a9caab796fc9a3ca91a528ec95eae/lib/iris/cube.py#L2615-L2634

Aside: I wonder if we can remove that numpy 1.11 workaround...

bjlittle commented 1 year ago

@bouweandela is this a blocker for you atm?

bouweandela commented 1 year ago

No, it's just some oddity I ran into when writing unit tests.

rcomer commented 1 year ago

As discussed in @SciTools/peloton this morning, I think the fix for this would involve special-casing scalar slices. If the behaviour isn't causing downstream problems, I suggest it would be better to leave it as it is in the interest of keeping the code [relatively] clean.