aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.2k stars 853 forks source link

Avoid parsing numbers when using RPCv2 protocol #5539

Closed sugmanue closed 2 months ago

sugmanue commented 2 months ago

Motivation and Context

The current unmarshalling process is split into two phases. During the first phase the input is parsed into a JsonNode instance that represents the whole JSON input. The second phase uses this JsonNode to unmarshall the nodes into SDK pojos.

For a regular JSON input the input, numeric values are represented as strings (e.g., 3.14, "17"), therefore we need to parse the strings to convert them into a numeric type. For Smithy RPCv2 we don't need this, the values in the CBOR payload are already encoded in a way that's a lot cheaper to convert than parsing the string.

This change makes it possible to read from the CBOR input the encoded values without having to convert them into strings and back into numbers. For that it changes the two phases of the unmarshalling process:

  1. During parsing, instead of creating a regular JsonNode, we create a EmbeddedObjectJsonNode which can carry any Object with the numeric value. For that, we introduce a new abstraction JsonValueNodeFactory which is used to create nodes for simple types which creates plain JsonNode by default but a different implementation creates EmbeddedObjectJsonNode for CBOR payloads.
  2. During unmarshalling we detect those values and use them directly instead of parsing their string values.

Benchmarks

The benchmarks results show below are from unmarshalling a single value of a shape similar to GetMetricData (see here), similar to getting a set of datapoints from a time series (each datapoint is a pair of the time and a floating point value). For the test we labeled 3 samples with "small", "medium", and, "big", and those vary by the amount of datapoints. The labeled "small" has 17, the one labeled "medium" has 37, and, finally the one labeled "big" has 157 datapoints.

The full benchmark results are shown below (showing only for unmarshall, no changes in the marshall results). The summary of how Smithy RPCv2 implementation compares with AWS-JSON is:

After

Before

After this change

Benchmark                              (protocol)  (size)  Mode  Cnt       Score     Error  Units
JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2   small  avgt    5    8,021.875 ± 199.474  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json   small  avgt    5   15,630.801 ± 113.576  ns/op

JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2  medium  avgt    5   12,655.069 ± 122.454  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json  medium  avgt    5   29,624.410 ± 913.256  ns/op

JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2     big  avgt    5   37,135.590 ± 531.297  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json     big  avgt    5  110,714.788 ± 389.752  ns/op

Before this change

Benchmark                              (protocol)  (size)  Mode  Cnt       Score     Error  Units
JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2   small  avgt    5   20,204.149 ±  152.628  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json   small  avgt    5   16,424.807 ±  378.576  ns/op

JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2  medium  avgt    5   35,466.582 ±  172.311  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json  medium  avgt    5   33,548.364 ±  799.947  ns/op

JsonMarshallerBenchmark.unmarshall  smithy-rpc-v2     big  avgt    5  137,709.264 ± 1920.922  ns/op
JsonMarshallerBenchmark.unmarshall       aws-json     big  avgt    5  112,142.916 ±  306.063  ns/op

Modifications

Testing

Screenshots (if appropriate)

Types of changes

Checklist

License

sonarcloud[bot] commented 2 months ago

Quality Gate Failed Quality Gate failed

Failed conditions
41.6% Coverage on New Code (required ≥ 80%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint