dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.63k stars 4.57k forks source link

[Performance Proposal] Use SIMDJson in Corehost #59642

Open deeprobin opened 2 years ago

deeprobin commented 2 years ago

Description

Replace rapidjson with simdjson

Data

https://github.com/simdjson/simdjson#performance-results

Analysis

SIMDJSON is 4 times faster than RapidJSON

ghost commented 2 years ago

Tagging subscribers to this area: @eiriktsarpalis, @layomia See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Replace [rapidjson](https://github.com/dotnet/runtime/tree/main/src/native/corehost/json/rapidjson) with [simdjson](https://github.com/simdjson/simdjson) ### Data https://github.com/simdjson/simdjson#performance-results ### Analysis SIMDJSON is 4 times faster than RapidJSON
Author: deeprobin
Assignees: -
Labels: `area-System.Text.Json`, `tenet-performance`, `untriaged`
Milestone: -
EgorBo commented 2 years ago

Is json parsing in corehost a bottleneck for you? If so please share some numbers. It's supposed to parse a relatively small json once, isn't? while I love simdjson library it sounds a bit overkill for corehost - it will increase binary size.

ghost commented 2 years ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Replace [rapidjson](https://github.com/dotnet/runtime/tree/main/src/native/corehost/json/rapidjson) with [simdjson](https://github.com/simdjson/simdjson) ### Data https://github.com/simdjson/simdjson#performance-results ### Analysis SIMDJSON is 4 times faster than RapidJSON
Author: deeprobin
Assignees: -
Labels: `tenet-performance`, `area-Host`, `untriaged`
Milestone: -
am11 commented 2 years ago

Generally, I agree with @EgorBo. However, it won't be the first time we are improving performance of apps startup by replacing JSON engine in corehost: https://github.com/dotnet/core-setup/pull/7708. Could be worth exploring?

EgorBo commented 2 years ago

Generally, I agree with @EgorBo. However, it won't be the first time we are improving performance of apps startup by replacing JSON engine in corehost: dotnet/core-setup#7708. Could be worth exploring?

nice! that pr even reduced binary size. However, I fear simdjson will increase by e.g. half-a-megabyte or so. Still indeed worth exploring then. I personally would love SimdJSON to power System.Text.Json instead (JsonDocument)

deeprobin commented 2 years ago

Well, performance-wise it is not a bottleneck for me, however, such an optimization does not hurt in my opinion. In my opinion we shouldn't care about the binary size in the end - or do you want to run the corehost on a floppy disk?

nice! that pr even reduced binary size. However, I fear simdjson will increase by e.g. half-a-megabyte or so. Still indeed worth exploring then.

@EgorBo

am11 commented 2 years ago

performance-wise it is not a bottleneck for me

Empirically speaking, when it comes to performance vs. size in runtime, normally performance takes precedence. We would still need to show numbers to discuss/understand the tradeoffs (I guess it would be a non-trivial work to get there).

do you want to run the corehost on a floppy disk?

Impact on file sizes in .NET is typically frowned upon, there is even a tag for it: size-reduction. In case of host, it will impact size of all hosts including singlefilehost where slimmer binaries are preferred: https://github.com/dotnet/runtime/issues/12629.

deeprobin commented 2 years ago

Then maybe an Optimization Level option would be useful. On the one hand the type optimized for binary size and on the other hand the one optimized for performance. The result can be achieved with preprocessor variables.

In Rust for example you can define this in the file Cargo.toml¹, in our case maybe in the Project.csproj file.

References

[1]: See profiles - https://doc.rust-lang.org/cargo/reference/profiles.html

vitek-karas commented 2 years ago

As @am11 mentions size is pretty important, specifically for the host, since it ships with basically every .NET app (so even small increase adds up in some cases).

I definitely think it's worth to see what the perf gain would be versus the size regression.

Regarding "optimize for size" - we have the ability to do that in managed code (Trimming), but currently we don't have that ability in native code. This is mainly due to the fact that the SDK doesn't actually produce native executables directly, it uses precompiled executables. So we would have to ship two versions of the host, one for perf, one for size. We already have two (normal, single-file) so this would double the number of hosts... All of this is doable, just relatively expensive.

In the case of the host there's additional tradeoff to think about - this code runs only during startup. So it's not just about CPU, it's also about IO and memory. All of these metrics should be as low as possible. The current hosts are definitely not perfect in this regard, but any changes will have to consider these aspects as well.

hez2010 commented 2 years ago

Related: https://github.com/dotnet/runtime/issues/28937

TkTech commented 2 years ago

If you're planning on including simdjson verbatim, rather than just the algorithms described in the paper, the size can be tweaked a decent bit with just build-time flags and minimal effort. You're likely only going to use the OnDemand or the DOM-based API, so the other can be removed. The runtime dispatch can be narrowed if Core requires, say, Haswell at a minimum then Westmere can be discarded. If at least SSE4 is required by other parts of Core, then the fallback parser can be removed.

I'm sure @lemire can offer some input here. Ultimately someone will need to try it to get realistic numbers since so much can be eliminated based off of how Core will use it.

lemire commented 2 years ago

@TkTech For something like a runtime, it would make sense to build a trimmed down kernel version and it would not be hard.

am11 commented 2 years ago

If at least SSE4 is required by other parts of Core, then the fallback parser can be removed.

Yes, this is the case.

Note that this issue is about corehost, the hosting layer where we use RapidJSON to parse two/three kinds of json files (deps, runtimeconfig, rid graph etc.). #28937 is tracking the actual runtime libraries part (wider audience).