certik / yaml-cpp

Automatically exported from code.google.com/p/yaml-cpp
MIT License
0 stars 0 forks source link

allow for full precision floating point serialization #197

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.output a floating point number such as 1 + eps

What version of the product are you using? On what operating system?
0.3.0, affects 0.5.0 too

What is the expected output? What do you see instead?
A number that when parsed yields the same value that was converted, instead a 
different number is yielded.

See this or many other threads for detail:
http://stackoverflow.com/questions/4738768/printing-double-without-losing-precis
ion

The executive summary is that digits10 is truncating the fractional digits a 
floating point number has and the round-trip output-input will be different in 
many cases.  You should instead clamp at digits10+1 and arguably set the 
defaults to be fully preserving as well to minimize surprises.  Patch attached 
that does this.  Alternatively use/implement David Gay's algorithm to convert 
to the shortest string which yields the same value could be used to keep yaml 
files small, preserve readability and what the computer actually had in memory 
at time of serialization.

http://www.ampl.com/REFS/rounding.pdf
David Gay's implementation:
http://www.netlib.org/fp/dtoa.c
http://www.netlib.org/fp/g_fmt.c

Apparently it is used all over the place (python, numerous web-browsers and 
popular databases) so it seems this is the defacto way to do it.

You may also want to use C/C++ strtod on input, which again likely borrows much 
of David's code as iostreams don't handle numerous cases, in particular 
denormed values (you likely already handle nan/inf)

http://www.cplusplus.com/reference/cstdlib/strtod/
http://www.cplusplus.com/reference/string/stod/

Original issue reported on code.google.com by nev...@gmail.com on 10 Apr 2013 at 1:06

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks - this looks awesome!

Original comment by jbe...@gmail.com on 11 Apr 2013 at 4:11

GoogleCodeExporter commented 9 years ago
updated patch that applies cleanly to 0.5.0... also fixed compile errors

Original comment by nev...@gmail.com on 12 Apr 2013 at 12:16

Attachments:

GoogleCodeExporter commented 9 years ago
Partially patched:

new api: 
https://code.google.com/p/yaml-cpp/source/detail?r=52d304c5f5c8eea612051d36b9cba
ca5fe453ff1

old api: 
https://code.google.com/p/yaml-cpp/source/detail?r=88b39ba2ff2037de42ffaaa37f8a6
1ada3b3808d&repo=old-api

I didn't use the scientific format, because I don't want the default floating 
point output to be scientific. I'll consider using the alternative 
implementation you provided, but I'm not sure.

Thanks for all the info!

Original comment by jbe...@gmail.com on 13 Apr 2013 at 5:23

GoogleCodeExporter commented 9 years ago
Well, scientific minimizes the average case length of the string for the most 
general workloads (I use yaml-cpp in a scientific/engineering setting), when 
you have lots of magnitude, but it does come at a higher cost for readability 
with say, integers.  But this is when work David did comes in as it always 
chooses the shortest (which is really the best answer, from transmission cost 
to human eyes).

Original comment by nev...@gmail.com on 17 Apr 2013 at 10:54

GoogleCodeExporter commented 9 years ago
I probably will add that code, but in the meantime, I'll keep it non-scientific.

Original comment by jbe...@gmail.com on 3 May 2013 at 1:07

GoogleCodeExporter commented 9 years ago
This is not sufficient in 64-bit architectures: the double 6.1501039517150184 
still gets truncated to 6.150103951715018, and when reading back, yields a 
different double.

Patch with std::numeric_limits<type>::digits10 + 2 and a test attached.

Original comment by todu...@gmail.com on 12 Mar 2015 at 11:01

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I don't see why 64bit makes a difference - either way a double is 64 bits as 
specified by the IEEE 754 standard.  Except when it's not and this might be 
what you're seeing:

http://en.wikipedia.org/wiki/Extended_precision

Original comment by nev...@gmail.com on 12 Mar 2015 at 11:37

GoogleCodeExporter commented 9 years ago
Extended precision sounds unlikely, however, whatever the case, I'm getting 
wrong results with +1 and correct ones with +2. The patch also contains a 
unittest for your reproductive pleasure.

Original comment by todu...@gmail.com on 12 Mar 2015 at 11:39

GoogleCodeExporter commented 9 years ago
You might find the following stackoverflow an interesting read:
http://stackoverflow.com/questions/3206101/extended-80-bit-double-floating-point
-in-x87-not-sse2-we-dont-miss-it

Basically you can expect this behavior unless you enforce ieee strictly - one 
way to use this is force usage of sse, see this post:

http://stackoverflow.com/questions/7295861/enabling-strict-floating-point-mode-i
n-gcc

And finally, look at what I stumbled onto, curtesy of 
http://en.wikipedia.org/wiki/Double-precision_floating-point_format :
This gives 15–17 significant decimal digits precision. If a decimal string 
with at most 15 significant digits is converted to IEEE 754 double precision 
representation and then converted back to a string with the same number of 
significant digits, then the final string should match the original. If an IEEE 
754 double precision is converted to a decimal string with at least 17 
significant digits and then converted back to double, then the final number 
must match the original.[1]

http://en.wikipedia.org/wiki/Single-precision_floating-point_format similiarly 
states 6-9 digits.

"
#include <limits>
#include <iostream>

int main(){
    std::cout<<std::numeric_limits<float>::digits10<<std::endl;
    std::cout<<std::numeric_limits<double>::digits10<<std::endl;
}
"
This prints:
6
15

Please bump double digits10+2 and float digits10+3.  Or maybe we should forget 
numeric traits and just hardcode it to 9 and 17?  IEEE standard and all.

I think there's still a valid point though that this won't cover all cases we 
should expect on desktops but minimally we need to make it work for IEEE 
standard.

Original comment by nev...@gmail.com on 13 Mar 2015 at 12:06