Replace existing json usages with `orjson`

anonrig commented 7 months ago

We currently use rapidjson and default json library to parse/stringify JSON. The alternative library orjson is 6 times faster than json and 2-3 times faster than rapidjson.

We (me and @fpacifici) validated that simply replacing rapidjson with orjson results in 25% improvement (from 12ms to 9ms) on a particular task.

For example, here's graph to visualize the difference between orjson and rapidjson:

Screenshot 2024-04-15 at 9 14 58 AM

The drill

The drill is always this:

Pick up a consumer or a component or a class
Change the json implementation - surround it with a feature flag
Test - enabled feature flags
Rollout - usage without feature flags
Cleanup - removal of feature flags
Done

Todos

[ ] Removing all json and rapidjson usages
[ ] Refactor sentry/utils/json.py

Task list

Component	Status	Author	Pull Request	Notes
Relay	4	@anonrig	https://github.com/getsentry/sentry/pull/69613	-
Indexer consumer	6	@anonrig	https://github.com/getsentry/sentry/pull/68802	20-25% overall reduction
Ingest processor	6	@anonrig	https://github.com/getsentry/sentry/pull/68811	-
Snuba	6	@anonrig	https://github.com/getsentry/sentry/pull/68938	-
Slack integration	4	@loewenheim	https://github.com/getsentry/sentry/pull/69409	-
reconstruct messages	4	@ayirr7	https://github.com/getsentry/sentry/pull/69378	50% reduction in JSON serialization and 25% reduction in average in all indexer payloads
outcomes_consumer	2	@vbro	https://github.com/getsentry/getsentry/pull/13779	-
event_manager	4	@loewenheim	https://github.com/getsentry/sentry/pull/69653	-
eventstore	4	@loewenheim	https://github.com/getsentry/sentry/pull/69653	-
auth	4	@loewenheim	https://github.com/getsentry/sentry/pull/69653	-
integrations	4	@loewenheim	https://github.com/getsentry/sentry/pull/69653	-
backup	4	@loewenheim	https://github.com/getsentry/sentry/pull/69653	-

JoshFerge commented 7 months ago

is JSON parsing a significant cause of latency in our consumers or endpoints? and would check sentry as you roll out, as with replays we noticed a bunch of errors associated with switching json implementations

anonrig commented 7 months ago

is JSON parsing a significant cause of latency in our consumers or endpoints? and would check sentry as you roll out, as with replays we noticed a bunch of errors associated with switching json implementations

@JoshFerge JSON parsing is a bottleneck in couple of places, and I think it has more impact than we assume it should have. Here's an example: https://sentry-st.sentry.io/performance/summary/?display=trend&end=2024-04-16T03%3A59%3A59&project=1513938&query=&referrer=performance-transaction-summary&start=2024-04-10T04%3A00%3A00&transaction=ingest_consumer.process_event&trendFunction=p95&unselectedSeries=p100%28%29%2Cavg%28%29

The only change done caused this improvement was due to rapidjson to orjson change.

ayirr7 commented 6 months ago

Here's an example of how this improved the time spent in one of the processing phases of the indexer:

Screenshot 2024-04-22 at 11 51 24 AM

This is the phase where JSON deserialization is done.

loewenheim commented 6 months ago

One difference between json and orjson is that orjson dumps to bytes, not str. Decoding that back to a str might cost us some performance, let's see how it shakes out.

anonrig commented 6 months ago

One difference between json and orjson is that orjson dumps to bytes, not str. Decoding that back to a str might cost us some performance, let's see how it shakes out.

@loewenheim Some functions such as Snuba ones take both bytes and str as parameters. We might not need to encode/decode in some places.

TkTech commented 6 months ago

In all of the cases where this change is being made, is the parsed JSON fully utilized? If there are cases where the JSON is only parsed for a routing key for example, this can be sped up typically by another order of magnitude.

There are quite a few differences between how rapidjson and orjson parse JSON. This doesn't mean either is wrong, just that they might differ and introduce behaviour changes. A good place to start is to compare the two is:

Specifically, if you're depending on very-accurate float or integer parsing/serialization there will be quirks. For example with rapidjson you would have been ingesting Nan and +/-Infinity as valid numbers, but with orjson you no longer do. Although yyjson is now the basis for the parser in orjson, orjson doesn't use its support for arbitrary integer parsing so the maximum size of an integer will also be different from Python's json.

Narretz commented 6 months ago

Why is the title of this issue "replace orjson with existing json usages"? It's the other way around, isn't it?

In the description it's fine"... simply replacing rapidjson with orjson results ..."

loewenheim commented 6 months ago

Changed the title.

anonrig commented 6 months ago

Max duration of reconstruct_messages.build_new_payload.json_step has been impacted by this changes as well.

Referencing @ayirr7

This gave us a 50% reduction in time spent, on average, on JSON serialization in the indexer and about 25% reduction in the time spent, on average, by the indexer in building a new payload which it sends along to Snuba. Cool!

anonrig commented 6 months ago

After replacing middlewares and integrations to use orjson, here's the changes:

simon-liebehenschel commented 3 months ago

@anonrig Could you please show the memory usage plot with/without orjson? I am not familiar with the orjson.dumps() internals, but I have heard from colleagues about some known problems with memory usage or memory leaks (if you want, you can scroll through the orjson issues closed by a Stale Bot). I am not saying that Sentry will definitely use more memory, because it depends on what is being dumped and other things, but if it does, it is better to know about it now than to debug OOMKilled errors on a server.

getsentry / sentry