Open anonrig opened 7 months ago
is JSON parsing a significant cause of latency in our consumers or endpoints? and would check sentry as you roll out, as with replays we noticed a bunch of errors associated with switching json implementations
is JSON parsing a significant cause of latency in our consumers or endpoints? and would check sentry as you roll out, as with replays we noticed a bunch of errors associated with switching json implementations
@JoshFerge JSON parsing is a bottleneck in couple of places, and I think it has more impact than we assume it should have. Here's an example: https://sentry-st.sentry.io/performance/summary/?display=trend&end=2024-04-16T03%3A59%3A59&project=1513938&query=&referrer=performance-transaction-summary&start=2024-04-10T04%3A00%3A00&transaction=ingest_consumer.process_event&trendFunction=p95&unselectedSeries=p100%28%29%2Cavg%28%29
The only change done caused this improvement was due to rapidjson to orjson change.
Here's an example of how this improved the time spent in one of the processing phases of the indexer:
This is the phase where JSON deserialization is done.
One difference between json
and orjson
is that orjson
dumps to bytes
, not str
. Decoding that back to a str
might cost us some performance, let's see how it shakes out.
One difference between
json
andorjson
is thatorjson
dumps tobytes
, notstr
. Decoding that back to astr
might cost us some performance, let's see how it shakes out.
@loewenheim Some functions such as Snuba ones take both bytes and str as parameters. We might not need to encode/decode in some places.
In all of the cases where this change is being made, is the parsed JSON fully utilized? If there are cases where the JSON is only parsed for a routing key for example, this can be sped up typically by another order of magnitude.
There are quite a few differences between how rapidjson and orjson parse JSON. This doesn't mean either is wrong, just that they might differ and introduce behaviour changes. A good place to start is to compare the two is:
Specifically, if you're depending on very-accurate float or integer parsing/serialization there will be quirks. For example with rapidjson you would have been ingesting Nan and +/-Infinity as valid numbers, but with orjson you no longer do. Although yyjson is now the basis for the parser in orjson, orjson doesn't use its support for arbitrary integer parsing so the maximum size of an integer will also be different from Python's json
.
Why is the title of this issue "replace orjson with existing json usages"? It's the other way around, isn't it?
In the description it's fine"... simply replacing rapidjson with orjson results ..."
Changed the title.
Max duration of reconstruct_messages.build_new_payload.json_step
has been impacted by this changes as well.
Referencing @ayirr7
This gave us a 50% reduction in time spent, on average, on JSON serialization in the indexer and about 25% reduction in the time spent, on average, by the indexer in building a new payload which it sends along to Snuba. Cool!
After replacing middlewares and integrations to use orjson
, here's the changes:
@anonrig Could you please show the memory usage plot with/without orjson
? I am not familiar with the orjson.dumps()
internals, but I have heard from colleagues about some known problems with memory usage or memory leaks (if you want, you can scroll through the orjson
issues closed by a Stale Bot). I am not saying that Sentry will definitely use more memory, because it depends on what is being dumped and other things, but if it does, it is better to know about it now than to debug OOMKilled errors on a server.
We currently use
rapidjson
and defaultjson
library to parse/stringify JSON. The alternative libraryorjson
is 6 times faster thanjson
and 2-3 times faster thanrapidjson
.We (me and @fpacifici) validated that simply replacing
rapidjson
withorjson
results in 25% improvement (from 12ms to 9ms) on a particular task.For example, here's graph to visualize the difference between
orjson
andrapidjson
:The drill
The drill is always this:
Todos
sentry/utils/json.py
Task list