NuGet / Home

Repo for NuGet Client issues
Other
1.49k stars 250 forks source link

Improve the no-op hashing performance #10068

Open nkolev92 opened 3 years ago

nkolev92 commented 3 years ago

Currently the no-op hash is calculated by creating a json string of the dependency graph spec for said project. This involves a lot of duplicate + using jsonwriter for hash is not ideal.

This issue tracks the investigation for improving that hashing performance.

I discussed with @cristinamanum and @zivkan

The suggestion is move to a normal hashing function instead of using the package specwriter. The challenge is that we don't know if the package spec is mutated during restore.

Given that here are the options.

This is expected to be a long investigation, because while perf is one goal, we need to ensure this doesn't get regressed in the future.

I did prototype these changes and it showed a significant improvement for large slns with deep project graphs, up to 2-4x faster hash generation. The hash generation right now is about ~30% of the time spent in the restore part (not dg spec generation).

So cmd could see numbers like 7s to 6.8s, but VS will relatively better perf, example, 1.1s -> 900ms.

nkolev92 commented 3 years ago

Setting the estimate to 30-ish, but likely I will only spend 5 on it in S177.

nkolev92 commented 3 years ago

Moving to S178.

nkolev92 commented 3 years ago

deprioritizing for now.

KirillOsenkov commented 9 months ago

I've done research on string hashing for MSBuild and we settled on a very decent hash function: https://github.com/KirillOsenkov/MSBuildStructuredLog/wiki/String-Hashing

This is what MSBuild uses for string hashing: https://github.com/dotnet/msbuild/blob/8da5fc70969354a2849e4dcec20cf59b98f4d2c3/src/Build/Logging/BinaryLogger/BuildEventArgsWriter.cs#L1319

It's also combineable: https://github.com/dotnet/msbuild/blob/8da5fc70969354a2849e4dcec20cf59b98f4d2c3/src/Build/Logging/BinaryLogger/BuildEventArgsWriter.cs#L1335

KirillOsenkov commented 9 months ago

I'd love to see this prioritized because it does seem like we could speed up restore significantly.

nkolev92 commented 6 months ago

In this branch, I'm caching the hash generation for the package spec https://github.com/NuGet/NuGet.Client/compare/dev-nkolev92-storePackageSpecHashes.

There's another change for the algorithm itself: https://github.com/NuGet/NuGet.Client/pull/5655.

KirillOsenkov commented 6 months ago

@jeffkl

nkolev92 commented 6 months ago

I rebased my branch to get https://github.com/NuGet/NuGet.Client/commit/89f12511aaf17de441f303744da60b48553b8480.

I'll probably need @jeffkl or @genellem to rerun the no-op tests with my change to see how much of a dent they're making.