Generated floats with more than expected precision

roland-sipos commented 3 years ago

Hi Brett,

When I'm using fake_app_confgen from minidaqapp, the generated floats from the defaults ended up with a weird precision:

"pop_limit_pct": 0.800000011920929, "pop_size_pct": 0.10000000149011612,

Can you please help us out what is the culprit?

roland-sipos commented 3 years ago

Sorry, this was a bit too vague. Please find extra context below:

The schema that triggered the issue is the following: https://github.com/DUNE-DAQ/readout/blob/develop/schema/readout-DataLinkHandler-schema.jsonnet

The script used for the generation is: https://github.com/DUNE-DAQ/minidaqapp/blob/develop/python/minidaqapp/fake_app_confgen.py

brettviren commented 3 years ago

I suspect this could be turned into a long journey....

You can reproduce this more simply with:

$ jsonnet -e '0.8'                                                                    
0.80000000000000004

There is discussion about this issue in:

https://github.com/google/jsonnet/issues/558

The issue is, apparently, Jsonnet will produce a string representation of the number which is actually closer to the FP value stored than is the value that the user initially provided. Ie, "0.80000000000000004" is closer to the actual FP representation in memory than our exact interpretation of the user string "0.8".

I think we are used to less exact string representations of FP values so do not normally see these "weird" strings.

I think this is a cosmetic issue but in principle we can "fix" it by doing something inside the moo.otypes class for number schema to reduce the precision of the value but i worry if that might introduce yet more corner cases.

I do note that Python interpreter seems to keep original number:

json.dumps(moo.jsonnet.loads('0.8'))
-->  '0.8'

But, more play shows we are actually dodging an "f4" bullet somehow and gaining "f8" precision. The number values get filtered through Numpy something like:

json.dumps(numpy.array('0.8', 'f4').item())
-->  '0.800000011920929'

So, we are somehow having our f4 FP value promoted to f8 (I guess). It may be due to the number originating in C++ inside the compiled _jsonnet Python module.

But then, something else must be truncating at f8:

json.dumps(numpy.array('0.8', 'f8').item())
--> '0.8'

alessandrothea commented 3 years ago

Hi, I don't know if this is helpful (or just annoying), but I noticed this

>>> numpy.array('0.8', 'f4').item()
0.800000011920929

which suggests that the conversion problem is between numpy and native python floats. Interestingly, str seems to know how to handle the correct precision

>>> str(numpy.array('0.8', 'f4'))
`0.8`

Unfortunately, that doesn't help much when dealing with json

>>> json.dumps(str(numpy.array('0.8', 'f4')))
'"0.8"'

Maybe this could be made to work using a custom json.JSONEncoder that exploits str (or rather numpy.array_str) behind the scenes?

alessandrothea commented 3 years ago

Example:

#!/usr/bin/env python

import json
import numpy as np

class NumpyArrayEncoder(json.JSONEncoder):
    def encode(self, obj):
        if isinstance(obj, np.ndarray):
            return np.array_str(obj)
        # Let the base class encode method raise the TypeError
        return json.JSONEncoder.encode(self, obj)

print(json.dumps(a, cls=NumpyArrayEncoder))

brettviren commented 3 years ago

Right, this is what I meant by us being used to truncation being done "to us" in low precision string representations. If str() or printf() of cout were more honest we'd see "weird" FP values more often.

In any case, yes, if we want to force a precision truncation we can leverage moo.otypes metaprogramming to insert something into the number schema classes. I don't think it is a high priority but I'll see if I can find something reasonable to do.

brettviren commented 3 years ago

I don't know if this catches all cases but there are two solutions:

1) use f8 in your schema

2) an f4 precision is now truncated using a string operation

float('%.6e'%val)

alessandrothea commented 3 years ago

Out of curiosity, given that otypea numbers are built around numpy array, why not letting numpy.array_str doing the heavy lifting instead of making a special case out of f4?

alessandrothea commented 3 years ago

Oh, I see. Because the conversion to pod is an intermediate step between an otypes.Number and JSON. Got it.

brettviren / moo

Generated floats with more than expected precision #11