NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
125 stars 19 forks source link

Unexpected arithmetic when multiplying field #96

Open dlrhodson opened 4 years ago

dlrhodson commented 4 years ago

Hi folks - first attempt to report features this way!

Have encountered an unexpected behaviour when combining fields - should this happen? Is it something to do with units? T is degC here, maybe we are expecting K? (see attached file) amv.nc.gz

>>> import cf
>>> cf.__version__
'3.5.1'
>>> data=cf.read('amv.nc')[0]
>>> data.array[0]
array([[20.51099598]])
>>> (data-data).array[0]
array([[0.]])
>>> (data-1*data).array[0]
array([[273.15]])
>>> (1*data-1*data).array[0]
array([[0.]])
>>> 
sadielbartholomew commented 4 years ago

Hi Dan, thanks for raising this & moreover for providing a code snippet which makes it effortless for us to investigate! The behaviour you've isolated is peculiar indeed. I've had a quick play around with your dataset & another one for diversity & as you have guessed, it does seem to be a bug whereby there is undesired unit conversion upon multiplication with a scalar.

In fact, it seems there is a similar effect with division & (perhaps other binary operations), e.g. continuing with the same dataset & names from your example, we can see the units changing with scalar operations:

>>> data.units
'degC'
>>> (1*data).units
'K'
>>> (data/1).units
'K'

It is important to note that the resulting data values, e.g: (data-1*data).array[0] being array([[273.15]]), are still strictly correct, as at the associated unit is 'K' after the operations, so the underlying issue is that the unit is being changed under the hood when that hasn't been requested & is unexpected (& by philosophy we don't want anything unexpected to happen!).

In general, cf-python is clever and will convert units appropriately for mathematical binary operations (see the fourth example under this section of the docs)

>>> (data*data).units
'K2'

But in the case as you raise of a binary operation of data with a plain number, by nature dimensionless, there should not (I believe) be any conversion.

I'll double check my thinking with David in case I have missed anything & get back to you. Thanks again for reporting.

davidhassell commented 4 years ago

I concur that this is a bug. I see that it is a "feature" of UDUNITS2 (outside of my immediate control) that when units that have an offset are multiplied by other units, the offset is lost:

In [7]: cf.Units('degC') * cf.Units('1')
Out[7]: <Units: K>

In [8]: cf.Units('degF') * cf.Units('1')
Out[8]: <Units: 0.555555555555556 K>

In [9]: cf.Units('m @ 10') * cf.Units('1')
Out[9]: <Units: m>

The solution is to update cfunits to account for this. The good news is that this can be done independently of cf-python, so, all being well, we'll have usable fix for this later today ...

davidhassell commented 4 years ago

https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits/udunits2lib.html#Examples

dlrhodson commented 4 years ago

Ah yes - Sadie, I see that makes sense - I didn't think to check the .units of the result! Thanks both for investigating!

davidhassell commented 4 years ago

Hi Dan - Sadie and I have discussed this further, and it's a tricky (20 celcius is not twice as warm as 10 celcius!). Your best bet is to convert to Kelvin first, then there is no problem. In a week or two we'll roll out a solution, which might just be to print a warning in these cases, we'll see .....

Thanks, David