ECCC-MSC / libecbufr

libecbufr is a general purpose, template-oriented BUFR encoding/decoding library
Other
10 stars 7 forks source link

unexpected numerical lossiness during decoding #19

Open vsouvan opened 4 years ago

vsouvan commented 4 years ago

Encoder:

double kelvin = 273.15; double drybulb_c = 0.0; / Celsius / double drybulb_k = drybulb_c + kelvin; / Kelvin / int desc = 12101; double scale = 2; double reference = 0. int BUFR_int = drybulb_k * pow(10,scale) - reference;

In the BUFR message, this is being coded as 27315. Which, I think, we can all agree is a sufficiently accurate representation of 0.0 C irrespective of any intermediate IEEE 754 magic.

In other words, the encoder looks good.

Decoder:

int BUFR_int = 27315; double drybulb_k = (BUFR_int + reference) / pow(10,scale); double drybulb_c = drybulb_k - kelvin;

Now, here's the thing... irrespective of the IEEE 754 encoding of the kelvin constant, we've supposedly kept all our transformations in the same 64-bit IEEE 754 form except where we convert to and from the integral BUFR encoding. And the value we're encoding, 273.15, is fully representable in the BUFR definition we're using. So, if the BUFR encoding/decoding sequence is stable, it should be perfectly safe to say:

assert( drybulb_c == 0.0 );

Sound reasonable?

Well, that's not what's happening. What's happening is that the double representation of the decoded kelvin_k is 273.149994 (which, even though 64-bit can't exactly represent 273.15, is less than half the precision it should be able to get), which gives me a drybulb_c of -0.000006.

So, point being is something in the decoding process appears to be lossy at the API level. What goes in is not what's coming out.

Now, there may be a perfectly logical reason; it could be something as trivial as the (BUFR_int + reference) / pow(10,scale) calculation being done in a 32-bit floating point space.


Imported from Launchpad using lp2gh.

vsouvan commented 4 years ago

(by chris-beauregard) Vanh confirms:

You are right, currently, libecbufr decode and store most of the values in FLOAT32.

And when you obtain a double by calling the function bufr_descriptor_get_dvalue() The returned value is simply a cast from float to double.

You can replicate the problem you mention here outside of libecbufr just by casting a float to double.

float   fval = 273.15;
double  dval =  273.15;
double    casted_val  =  fval;

double   drybulb_c =   dval - casted_val;

and you will end up with     drybulb_c = -0.000006

The solution may be to do all decoding and store using double only.

You can do that by changing functions bufr_datatype_to_valtype() and bufr_encoding_to_valtype()

  Replace all    VALTYPE_FLT32   by   VALTYPE_FLT64