NuGet / Home

Repo for NuGet Client issues
Other
1.5k stars 253 forks source link

Apply deduplication of content files #12909

Open msedi opened 1 year ago

msedi commented 1 year ago

NuGet Product(s) Affected

Other/NA

Current Behavior

We are delivering large files with our nuget packages, similar to this question on stackoverflow: https://stackoverflow.com/questions/47468941/prevent-duplicating-files-in-nuget-content-and-contentfiles-folders.

To my knowledge for compatbility reasons nuget packs the content into content and contentFiles but this means that the resulting archive has double the size as it must have,

For example we have a nuget package of 85MB. If I unpack it it has 220MB. The size of the content alone is 109MB. Currently both, content and contentFiles 109MB.

If I simply take 7zip and pack the nuget package again, I get 72MB. So this is a first thing, obviously the NUGET zipper is not well optimized. But this is another thing.

If I now remove the content and just keep the contentFiles and then use 7zip again, I get a package of 36MB which is less then half the original size (of course taking the better compression performance of 7zip into account).

Desired Behavior

Some tools, for example innosetup have the possibility to deduplicate files and only holds the same file once but adds it in multiple locations. But these tools are working with binary statistics since innosetup doesn't know which files are duplicate.

In the case of nuget content and contentFiles we definitely know what is a duplicate. This would improve the package size a lot which would in turn reduce download size and overall performance.

Additional Context

Of course the deplication can be extended to all files in the package.

nkolev92 commented 1 year ago

Thanks for filing this @msedi!

There's no deduplication in any part of the package definition.

Do you expect your packages to be used by PackageReference projects or packages.config? If it's only PackageReference, or only packages.config then you can just use the recommendation from the stack overflow issue.

nkolev92 commented 1 year ago

A few thoughts on this: This would be something that'd make the package not backwards compatible, so not sure how feasible it'd be. It would effectively make the package unusable by earlier tooling versions.

msedi commented 1 year ago

@nkolev92. Sure, the stackoverflow approach would work. Regarding the compatibility I agree with you and this is the reason why I asked the question since I don't know about the internally used decompress. If the decompressor is already able to unzip links in a way that the result looks identical it would not be a breaking change, but as said above, I don't know which decompressor is used internally.