turning on `serde_json`'s `arbitrary_precision` feature is infectious to importers

robjtede commented 1 week ago

What is the current bug behavior?

It's a well known rough edge that serde_json's arbitary precision feature changes the behavior for all users of the crate; affecting uses of 3rd party data types the most.

In our projects, we've gone to great lengths to avoid turning this feature on because it breaks some (or all) of our integration test code, instead opting for rust_decimal in all numeric cases which has support for using custom deserializers on each field to precisely control when string vs number types are expected in numeric fields.

While trying to upgrade our wrapper library from hurl/hurl_core v4.1 -> v4.3 (for isIsoDate support) we noticed that CI was failing in a non-obvious way (which is

Steps to reproduce

For a project with rust_decimal (with serde-with-float feature to make the custom deserializer available, it's not important for this bug), using this snippet:

#[test]
fn deserialize_float_to_rust_decimal() {
    #[derive(serde::Deserialize)]
    struct Data {
        #[serde(with = "rust_decimal::serde::float")]
        value: rust_decimal::Decimal,
    }

    let json = r#"
    {
        "value": 1.0
    }
    "#;

    serde_json::from_str::<Data>(json).unwrap();
}

... if we were to introduce hurl v4.3 dependency this test would spontaneously fail, without changes.

What is the expected correct behavior?

A minor library upgrade shouldn't significantly change behavior at runtime.
It should be possible to use hurl as a library without being forced to use the arbitrary_precision feature on serde_json.

Execution context

Crate importing hurl
hurl Version: 4.3.0

Possible fixes

Not sure yet.

It's possible that an optional crate feature could solve this, which is always used by the CLI binary but gives library importers control over their context. Though, I haven't yet scanned the codebase for exactly how or why this feature is being used, so it may be non-trivial to do this.

jcamiel commented 1 week ago

Hi @robjtede

The arbitrary_precision flag has been introduced in f3909c21cfe8a28750fde4138c17ce09d9b65bb4: the main issue was to address JSON number precision in JSONPath for instance:

Given a JSON response body like this:

{
  "integer": 1,
  "float": 1.0,
  "small_float1": 0.1,
  "small_float2": 0.100000000000000005,
  "big_float1": 1000000000000000000000.0,
  "big_float2": 1000000000000000000000.5,
  "big_integer": 1000000000000000000000
}

I want to be able to test this response correctly with this Hurl file:

GET http://localhost:8000/predicates-number
HTTP 200
[Asserts]
jsonpath "$.big_float1" isFloat
jsonpath "$.big_float1" == 1000000000000000000000.0
jsonpath "$.big_float1" == 1000000000000000000000.5
jsonpath "$.big_float2" == 1000000000000000000000.0
jsonpath "$.big_integer" == 1000000000000000000000
jsonpath "$.big_integer" isInteger

(you can see our integration test here https://github.com/Orange-OpenSource/hurl/blob/4a67bd5cb00c5bc942ac330138c0179e19191359/integration/hurl/tests_ok/predicates_number.hurl)

Because JSON body parsing is done with serde, we've activated serde's flag arbitrary_precision. This is for the why! I'm not at ease to have this behaviour to be activated /deactivated behind a flag: given a JSON response body and a Hurl file, we want to have the same reproductible result, whatever is the "runner". Plus, we don't want this kind of JSON error (bad precision in JSON) that you can find in other tools (see https://github.com/usebruno/bruno/issues/2438)

Regarding the binary compatibility break, we try to be better but it's still not perfect. I've noted that we should check and upgrade version when we activate dependencies flag.

RapidPencil commented 1 week ago

I'm not at ease to have this behaviour to be activated /deactivated behind a flag: given a JSON response body and a Hurl file, we want to have the same reproductible result, whatever is the "runner"

You could potentially make arbitrary_precision part of your default features. That still enables us to opt out of it.

Our situation is that we want to use Hurl to test an API that uses an HMAC authentication scheme. To do this, we use Hurl as a library so we can compute and inject the MAC header in each hurl entry (request). Unfortunately, we cannot allow serde_json's infectious arbitrary_precision feature because it alters the behavior of unrelated crates in the dependency tree. As such we are unable to upgrade to recent versions of Hurl.

jcamiel commented 1 week ago

@RapidPencil I understand your point but I'm really really reluctant to do this. I'm not at all happy with having a different Hurl's output for a given input. Our priority is the command line and I don't want to expose/support the crate as a liability for us. We need to discuss it with the other maintainer @fabricereix @lepapareil and see if there are others alternatives.

jcamiel commented 1 week ago

@robjtede @RapidPencil we have looked around this issue (we still want to have a minimal reproductible sample if you have something like that it would be cool). If we want to not activate arbitrary_precision serde_json flag so client using hurl crate have no regression, and support big integer in JSON (which we want), the only way we see is to parse / serialize JSON ourself in Hurl code (without serde):

parsing JSON response
serializing results to JSON (--json option)
serializing report for Hurl 5.0.0--report-jsonoption.

That implies:

potential perf drop (our parser vs serde_json parser)
potential regression in JSON parsing (especially in HTTP response)
not depending on serde_json (and maybe serde) improvements and fixes

The bad news is that it is not a small task. I'm particularly worried on regression on JSON parsing .

The good news is that we already have a JSON parser: we use it for request body because we're supporting "JSON-like" body with Hurl templates:

GET https://foo.bin
{
  "user": {{name}}
}

So we can use our own parser everywhere, provided:

we check that we're compliant with most of JSON possible bodies
we have performances near serde_json

Any bug we'll identify will be a bug that we could have for request body so this is rather virtuous. We can begin work on it, for a target version of 5.1.0 (5.0.0 has already a lot of changes and we don't want to delay it). This may lands in october/november given out track records of previous version.

Any idea, remarks, suggestions?

robjtede commented 1 week ago

Appreciate the look into it. For sure this is non-trivial to solve and I understand that lib consumers aren't your primary use case.

Really the problem lies in serde_json and it'd be far too much to ask you folks to implement a whole separate parser.

That said, I think we can work around it for now with some cargo patching.

jcamiel commented 4 days ago

For reference, benchmark for parsing a 25M JSON document with serde_json vs hurl_core:

https://github.com/jcamiel/jsonperf

There is currently a x10 diff in defavor of hurl_core, we should set a max x2 compared to serde_json (with arbitrary precision). First analysis: serde_json works on u8 minimizing allocation, while hurl_core works on a vec of char. We should create a specialized parser (no template support, works on u8 etc..).

Orange-OpenSource / hurl