ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

`annual_statistics` may change dtype #2370

Closed schlunma closed 4 months ago

schlunma commented 6 months ago

Describe the bug

Very similar to https://github.com/ESMValGroup/ESMValCore/issues/1236 to and https://github.com/ESMValGroup/ESMValCore/issues/1235: The annual_statistics my change the dtype of the data.

Since we have lots of custom code in our preprocessors (example), I think it would make sense to somehow sense to generalize this.

I am thinking about adding a decorator that does that. An alternative could be to add it to our preprocess function just like the rechunking. However, this won't be useful for people who use the preprocessor outside of ESMValTool, so I would prefer a decorator.

@ESMValGroup/technical-lead-development-team thoughts?

bouweandela commented 6 months ago

Where does the type change originate? Iris? Numpy? If it is one of those, then that would be the obvious place to fix this.

schlunma commented 6 months ago

I agree with you, but unfortunately I don't have the time to deep dive into the code at the moment and wait for other libraries to fix the problem. It would be much easier to write a common solution to that problem in ESMValCore (we already do that kind of fix in various preprocessors).

valeriupredoi commented 6 months ago

This sounds like my kind of problem - will investigate 👍

schlunma commented 5 months ago

Related PR: https://github.com/ESMValGroup/ESMValCore/pull/1240

schlunma commented 5 months ago

I opened a draft PR here: https://github.com/ESMValGroup/ESMValCore/pull/2393

One possible reasons dtypes may change is that numpy.ma.average changes float32 to float64 as described here.

valeriupredoi commented 5 months ago

indeed, Numpy's a dictator:

import numpy as np

c = np.float32(22)
d = np.float32(22)
print(type(np.ma.average(np.ma.array([c, d], mask=[False, True]))))

that returns <class 'numpy.float64'> because the masked element is float64 - this is not ideal to say the least. But:

import numpy as np

c = np.float32(22)
d = np.float32(22)
print(type(np.ma.average(np.ma.array([c, d], ))))

will happily return <class 'numpy.float32'>

valeriupredoi commented 5 months ago

we should see what Numpy 2.0 does about this, in fact, I'll have to get us a dev branch with Numpy 2.0RC see what barfs, am expecting barfing

bouweandela commented 5 months ago

It looks like the issue has been reported here https://github.com/numpy/numpy/issues/14462 and someone had a go at fixing it, but that work seems to have stalled a few years ago.

valeriupredoi commented 5 months ago

good catch @bouweandela - I plopped a q there, maybe the mythical Numpy 2.0 will sort it out? Regardless, tomorrow I'll be on to creating a dev branch with the RC https://anaconda.org/conda-forge/numpy/files https://github.com/numpy/numpy/releases