Tencent / rapidjson

A fast JSON parser/generator for C++ with both SAX/DOM style API
http://rapidjson.org/
Other
14.18k stars 3.53k forks source link

rapidjson::Document::Parse( STRINGIFY( DBL_MAX ) ) == inf #710

Open mre4ce opened 8 years ago

mre4ce commented 8 years ago

The following code:

#define STRINGIFY_EXPANDED( a )    #a
#define STRINGIFY( a )             STRINGIFY_EXPANDED(a)

const char * string = STRINGIFY( DBL_MAX );
rapidjson::Document json;
json.Parse( string );

printf( "string     = %s\n", string );
printf( "rapidjson  = %.17g\n", json.GetDouble() );
printf( "strtod     = %.17g\n", strtod( string, NULL ) );

Compiled with VS2015 update 3 results in the following:

string     = 1.7976931348623158e+308
rapidjson  = inf
strtod     = 1.7976931348623157e+308

Arguably this is nit picky because "inf" is only one ulp off from "1.7976931348623157e+308" but because JSON does not support "inf" or "nan" I would consider it incorrect for a valid floating-point number (DBL_MAX) to turn into "inf". An argument can be made that any number > DBL_MAX should turn into DBL_MAX to avoid an "inf".

You can also argue that this is still correct because the JSON spec isn't specific about the ranges of numbers that should be supported. However, I would expect a C/C++ JSON implementation to support the full ranges of the basic C/C++ types.

It is actually surprising how many C/C++ JSON implementations do not support the full int32_t, uint32_t, int64_t, uint64_t, float and double ranges with reasonable accuracy. For instance, while sajson is technically cool it currently cannot represent the full int32_t range because it considers any value < -2147483559 or > 2147483559 to be a double and it won't cast a double to an integer.

sajson::parse( sajson::string( "[-2147483648]", 13 ) ).get_root().get_array_element(0).get_integer_value() == 0
sajson::parse( sajson::string( "[-2147483647]", 13 ) ).get_root().get_array_element(0).get_integer_value() == -4194304
sajson::parse( sajson::string( "[-2147483646]", 13 ) ).get_root().get_array_element(0).get_integer_value() == -8388608
sajson::parse( sajson::string( "[2147483645]", 12 ) ).get_root().get_array_element(0).get_integer_value() == -12582912
sajson::parse( sajson::string( "[2147483646]", 12 ) ).get_root().get_array_element(0).get_integer_value() == -8388608
sajson::parse( sajson::string( "[2147483647]", 12 ) ).get_root().get_array_element(0).get_integer_value() == -4194304
mre4ce commented 8 years ago

This problem goes away when using kParseFullPrecisionFlag, but with this flag I get the following:

"1.7976931348623159e+309" results in -5.5626846462680035e-308 "-1.7976931348623159e+309" results in 5.5626846462680035e-308

Which should be either inf/-inf or clamped to DBL_MAX and DBL_MIN.

mre4ce commented 8 years ago

These are the results with several different compilers.

Without kParseFullPrecisionFlag:

VS2015 - update 3

rapidjson::Document::Parse( "1.7976931348623158e+308" ).GetDouble() == inf
strtod( "1.7976931348623158e+308", NULL )                           == 1.7976931348623157e+308

Android GCC 4.9.0

rapidjson::Document::Parse( "1.7976931348623158e+308" ).GetDouble() == inf
strtod( "1.7976931348623158e+308", NULL )                           == 1.7976931348623157e+308

Android Clang 3.8.256229

rapidjson::Document::Parse( "1.7976931348623158e+308" ).GetDouble() == 1.7976931348623157e+308
strtod( "1.7976931348623158e+308", NULL )                           == 1.7976931348623157e+308

With kParseFullPrecisionFlag:

VS2015 - update 3

rapidjson::Document::Parse( "1.7976931348623159e+309" ).GetDouble() == -5.5626846462680035e-308
rapidjson::Document::Parse( "-1.7976931348623159e+309" ).GetDouble() == 5.5626846462680035e-308

Android GCC 4.9.0

rapidjson::Document::Parse( "1.7976931348623159e+309" ).GetDouble() == inf
rapidjson::Document::Parse( "-1.7976931348623159e+309" ).GetDouble() == -inf

Android Clang 3.8.256229

rapidjson::Document::Parse( "1.7976931348623159e+309" ).GetDouble() == inf
rapidjson::Document::Parse( "-1.7976931348623159e+309" ).GetDouble() == -inf
miloyip commented 8 years ago

I checked the max double is

(1 + (1 - 2^-52)) * 2^1023

1797693134862315708145274237317043567980705675258449965989174768031572\
6078002853876058955863276687817154045895351438246423432132688946418276\
8467546703537516986049910576551282076245490090389328944075868508455133\
9423045832369032229481658085593321233482747978262041447231687381771809\
19299881250404026184124858368

1.7976931348623159e+309 is larger than this value and the handling of it is arguable.

I think it should generate kParseErrorNumberTooBig for numbers that are over the range of double. I will check whether it can be handled for these cases.

I don't think treating them as inf is a good choice. RapidJSON does actually support inf/nan parsing (kParseNanAndInfFlag) and generation (kWriteNanAndInfFlag).

mre4ce commented 8 years ago

Either a parse error or returning DBL_MAX for values that are outside the double range would work. I would prefer clamping the value because values outside the [DBL_MIN, DBL_MAX] range are not JSON errors. Either way, the most important thing is to document the behavior.

1. Does the parser support the whole [DBL_MIN, DBL_MAX] range?
2. Are values clamped to the [DBL_MIN, DBL_MAX] range?

The answer to both questions would currently be no which is within the JSON spec and perfectly fine when documented.

miloyip commented 8 years ago

RapidJSON supports IEEE-754 double precision floating point range. Stringifying DBL_MAX may cause precision problem due to preprocessor (so that some compilers may generate numerical error).

mre4ce commented 8 years ago

Afaik DBL_MAX is always defined as 1.7976931348623158e+308 and will always stringify to "1.7976931348623158e+308".