ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Unify dtype handling of preprocessors #2393

Closed schlunma closed 4 months ago

schlunma commented 4 months ago

Description

Many preprocessors have use some kind of custom code to make sure that dtypes are preserved. This PR adds a decorator @preserve_float_dtype unifies this task. Only float dtypes will be preserved, other dtypes are allowed to change. This ensures that int objects may be transformed to float objects when taking the mean for example (similar to what numpy does).

Closes #2370

Doc: https://esmvaltool--2393.org.readthedocs.build/projects/ESMValCore/en/2393/api/esmvalcore.preprocessor.html#preprocessor-functions


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the πŸ›  Technical or πŸ§ͺ Scientific review.


To help with the number pull requests:

valeriupredoi commented 4 months ago

Manu, I'm sorry, man - I promised I'd look into this, and it fell off my radar screen like a Russian Sukhoi over the Black Sea. I'll at least help out here :+1:

schlunma commented 4 months ago

Since inspect.getfullargspec apparently does not work with decorated functions, I changed this to inspect.signature instead, which provides very similar functionality and handles decorated functions properly.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.31%. Comparing base (1e49077) to head (d8513db).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2393 +/- ## ========================================== + Coverage 94.30% 94.31% +0.01% ========================================== Files 246 246 Lines 13549 13580 +31 ========================================== + Hits 12777 12808 +31 Misses 772 772 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bouweandela commented 4 months ago

Since inspect.getfullargspec apparently does not work with decorated functions

I'm a bit puzzled by this comment. Why does the decorator that we use for indicating that a preprocessor function requires ancillary variables or cell measures not cause this issue? https://github.com/ESMValGroup/ESMValCore/blob/bbd307dc2202f28065807dda4e0f63ad122336c6/esmvalcore/preprocessor/_area.py#L366

schlunma commented 4 months ago

The register_supplementaries decorator does not need to consider the arguments or return value of the decorated function, so the wrapper can simply return the decorated function itself (with the original signature):

https://github.com/ESMValGroup/ESMValCore/blob/bbd307dc2202f28065807dda4e0f63ad122336c6/esmvalcore/preprocessor/_supplementary_vars.py#L35-L37

On the other hand, the preserve_float_dtype decorator needs to access the first argument and change the return value of the decorated function:

https://github.com/ESMValGroup/ESMValCore/blob/5b470db800a0137908ef1a920a8a6606aee204ad/esmvalcore/preprocessor/_shared.py#L257-L266

Thus, the signature of the decorated function changes to that of the wrapper. In theory, this can be solved with the functools.wraps decorator. However, as mentioned here, this is only considered by inspect.signature, not by inspect.getfullargspec.

valeriupredoi commented 4 months ago

this shall get reviewed today by yours truly - sorry, Manu, was busy all week with non-ESMValX suffs

schlunma commented 4 months ago

Thanks for reviewing V! I also think the decorator is the preferable way (that's why I implemented it πŸ˜„)

valeriupredoi commented 4 months ago

Thanks for reviewing V! I also think the decorator is the preferable way (that's why I implemented it πŸ˜„)

About docs: I'd think that we should mention that preproc X or Y preserves the dtype - like Numpy docs say - would you be OK to add that to each preproc's API docs? I can do that, I owe you a bit of help on this since I promised I will help :grin: Alternatively, we can provide a list of preproc functions that preserve dtypes, like irirs do with their lazy/realized data funcs

schlunma commented 4 months ago

I think it would make more sense to document those that do not preserve float dtypes (IMHO, our default should be to preserve them). It would be even better to fix them πŸ˜„

So maybe we can mention in the beginning that (hopefully) all of our preprocessors preserve float dtypes (ints may be changed)? How does that sound?

valeriupredoi commented 4 months ago

that too! I was debating this amongst my few working brain cells (given my new old age) just a bit earlier - a good point, Manu!

schlunma commented 4 months ago

Updated the docs: https://esmvaltool--2393.org.readthedocs.build/projects/ESMValCore/en/2393/api/esmvalcore.preprocessor.html#preprocessor-functions

valeriupredoi commented 4 months ago

@bouweandela would you be OK to give this a onesie and merge, plss, bud? :beer: