Tencent / rapidjson

A fast JSON parser/generator for C++ with both SAX/DOM style API
http://rapidjson.org/
Other
14.22k stars 3.53k forks source link

Problem verifing schema with multipleOf == 0.1 #1089

Open m-wachter opened 7 years ago

m-wachter commented 7 years ago

Hello everyone,

first of all thank you for this great library.

I have a problem verifying my json file against a schema with a "multipleOf" set to 0.1. To reproduce I added the following test-case to schematest.cpp:

TEST(SchemaValidator, Number_MultipleOfPointOne) {
    Document sd;
    sd.Parse("{\"type\":\"number\",\"multipleOf\":0.1}");
    SchemaDocument s(sd);

    VALIDATE(s, "0.1", true);
    VALIDATE(s, "1", true);
    VALIDATE(s, "17.2", true);
    VALIDATE(s, "-17.2", true);
    INVALIDATE(s, "17.21", "", "multipleOf", "");
}

It seems that there is a precision related problem in the function CheckDoubleMultipleOf in shema.h .

bool CheckDoubleMultipleOf(Context& context, double d) const {
    double a = std::abs(d), b = std::abs(multipleOf_.GetDouble());
    double q = std::floor(a / b);
    double r = a - q * b;
    if (r > 0.0)
        RAPIDJSON_INVALID_KEYWORD_RETURN(GetMultipleOfString());
    return true;
}

Because 0.1 has no finite double representation, the value of b is slightly above 0.1. In the test-case with 17.2 the division a / b yields a value slightly below 172 which is then "floored" to 171. So q * b is too small and r is positive.

It would be nice if someone could have a look at this.

Thanks Michael.

miloyip commented 7 years ago

This floating point issue seems hard to be solved for all cases. In floating point representations, 17.2 is not exactly a multiple of 0.1. Adding epsilon comparison may put some cases into positive results, but it may also make false negatives.

I am open for suggestion.

m-wachter commented 7 years ago

That's exactly the problem.

I tried the following implementation:

bool CheckDoubleMultipleOf(Context& context, double d) const {
    double b = std::abs(multipleOf_.GetDouble());
    double limit = b * 0.0000000001;
    if (std::abs(remainderl(d, b)) > limit)
            RAPIDJSON_INVALID_KEYWORD_RETURN(GetMultipleOfString());
    return true;
}

It works for me because of the value range in my application, but i would not like to include it into a general library because of the calculation of the limit.

This works fine for the test-cases above and works also for more extreme cases like checking if 1E+100 is multiple of 3 (returns error) but fails for 1E+100 is multiple of 0.1 (remainderl(1E100,0.1) == 0.0047721682623838579).

Maybe someone has a better idea.

smhdfdl commented 2 years ago

@miloyip A user of my software product has hit this problem, seeing the incorrect validation error 'Number '7.770000' is not a multiple of the 'multipleOf' value '0.010000'. I'd like the issue re-opened please and a robust solution developed. If you recall, I had to change a unit test in #1837 to avoid the use of a multipleof value less than 0.

smhdfdl commented 2 years ago

@miloyip your thoughts please

smhdfdl commented 1 year ago

@miloyip please give me the authority to re-open issues, and also merge PRs.

miloyip commented 1 year ago

@smhdfdl What "robust" solution do you have in mind? I think there may be several solutions:

  1. absolute or relative threshold
  2. decimal representation
  3. rational representation For any one of these, user may need to configure some settings in order to make it work.
smhdfdl commented 1 year ago
  1. An option to use decimals when dealing with non-integer numbers, instead of using floats. The enterprise software product I work on that uses RapidJSON already has such an option for parsing (we use our own parser and only use RapidJSON for validation). For example, https://github.com/dnotq/decNumber.
smhdfdl commented 1 year ago

@miloyip Any further thoughts on this please?

smhdfdl commented 1 year ago

@miloyip Any further thoughts on this please?

miloyip commented 1 year ago

I think I would prefer threshold solution. The others are too complicated.

smhdfdl commented 1 year ago

@miloyip Please elaborate on how a threshold solution would work

miloyip commented 1 year ago

For example, adding a threshold user setting in SchemaValidator. The limitation is that it will be applying to all multipleOf. But I think in most case it should be good enough.

smhdfdl commented 1 year ago

@miloyip Sorry I don't understand what you mean. What do you mean by threshold and how would it work?

miloyip commented 1 year ago

To check whether a is approximately a multiple of b, we may:

quotient = floor(|a| / |b|);
remainder = |a| - quotient * |b|;
return remainder < |b| * relative_error_threshold;
smhdfdl commented 1 year ago

'Approximately' is no good. The point is that for many users the multipleOf calculation needs to be 100% accurate. You need to provide a reliable number implementation, hence why I prefer option 2. The ability to use Mike Cowlishaw's decimal code in RapidJSON for both parsing and validating would be awesome.

miloyip commented 1 year ago

'Approximately' is no good. The point is that for many users the multipleOf calculation needs to be 100% accurate. You need to provide a reliable number implementation, hence why I prefer option 2. The ability to use Mike Cowlishaw's decimal code in RapidJSON for both parsing and validating would be awesome.

That library seems not an arbitrary precision solution, which will have limitation in precision and cannot be 100% accurate. A possible solution is to parse the number as arbitrary precision integer and exponent, then convert both numbers to integers (as we only need the compute ratio of two numbers), finally doing a long division for arbitrary precision integers. But the performance may be really worse.

miloyip commented 1 year ago

Besides, I think it may need to use some dtoa, such as Grisu2() to convert double back to decimal.

smhdfdl commented 1 year ago

@miloyip That library is used all over the world as the de-facto decimal conversion library. The enterprise service product I work on has used it for 20 years without complaint. If added to RapidJSON a user would have a choice - use that or JSON doubles, so he can choose between accuracy and arbitrary precision.

miloyip commented 1 year ago

@miloyip That library is used all over the world as the de-facto decimal conversion library. The enterprise service product I work on has used it for 20 years without complaint. If added to RapidJSON a user would have a choice - use that or JSON doubles, so he can choose between accuracy and arbitrary precision.

But 100% accurate was what you would like to get. You may evaluate if that library can really do it correctly, for example, whether a very large number is a multiple of a very small number. And directly incorporating the library will lose rapidjson's header-only feature. A possible way is to provide a call back function and let user to do it with their own solutions.

smhdfdl commented 1 year ago

@miloyip I callback solution would work for us I think. My colleague is away at the moment, I will consult with him and get back to you.

adaxenberger commented 1 year ago

Solution using std::numeric_limits::epsilon:

...
#include <limits>
...
    bool CheckDoubleMultipleOf(Context& context, double d) const {
        double a = std::fabs(d), b = std::fabs(multipleOf_.GetDouble());
        double q = a / b;
        double qRounded = std::round(q);
        double scaledEpsilon = (q + qRounded) * std::numeric_limits<double>::epsilon();
        double difference = std::fabs(qRounded - q);
        bool isMultiple = (difference <= scaledEpsilon)
                            || (difference < std::numeric_limits<double>::min());

        if (!isMultiple)
            RAPIDJSON_INVALID_KEYWORD_RETURN(GetMultipleOfString());
        return true;
    }
...