dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Plan to migrate to zlib-ng #101465

Closed carlossanlop closed 2 weeks ago

carlossanlop commented 5 months ago

For .NET 9, we are planning to replace our dependency of madler/zlib and zlib-intel with a dependency to zlib-ng.

The zlib-ng repo offers an alternative implementation of the zlib algorithm that:

Acceptance criteria

Tests

Threat modeling

Elapsed time

Output size

Managed memory

Unmanaged memory

Servicing plan

Execution plan

Milestone 1: Migration

Milestone 2: Testing

Milestone 3: Performance

Milestone 4: Documentation

dotnet-policy-service[bot] commented 5 months ago

Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.

jkotas commented 5 months ago

Discuss if we want to add a compat switch to allow users to consume their own zlib DLL

I do not think we should be adding compat switches like this proactively.

omajid commented 5 months ago

Fedora moved recently from zlib to zlib-ng: https://fedoraproject.org/wiki/Changes/ZlibNGTransition

.NET (VMR) builds in Fedora are already using this (since it's the system zlib now) instead of the original zlib project. And we have used the self-built-VMR (release/8.0) that uses zlib-ng to build itself a few times now. We have not noticed any functional bugs/issues on Linux on arm64, ppc64le, s390x or x64.

ericstj commented 5 months ago

I do not think we should be adding compat switches like this proactively.

@jkotas we did this for ICU. We had a folks broken when we previously moved from managed implementation > zlib, and zlib > zlib-intel. I was imagining we'd need something as an escape hatch here. We wouldn't be carrying those alternative implementations but we could let folks bring their own. What worries you about providing a compat switch? Locking down the ABI between S.C.Compression and native?

stephentoub commented 5 months ago

I do not think we should be adding compat switches like this proactively.

@jkotas we did this for ICU. We had a folks broken when we previously moved from managed implementation > zlib, and zlib > zlib-intel. I was imagining we'd need something as an escape hatch here. We wouldn't be carrying those alternative implementations but we could let folks bring their own. What worries you about providing a compat switch? Locking down the ABI between S.C.Compression and native?

But there are actual and significant functional differences between both NLS and ICU and between ICU versions (with different data sets). I would not expect that to be the case with zlib; what specifically are we concerned about?

carlossanlop commented 5 months ago

what specifically are we concerned about?

If something stops working, something that we didn't catch in testing, we would have a workaround to give to customers: enable the compat switch, use your own DLL (it can be the old zlib DLL), while we fix the issue.

stephentoub commented 5 months ago

what specifically are we concerned about?

If something stops working, something that we didn't catch in testing, we would have a workaround to give to customers: enable the compat switch, use your own DLL (it can be the old zlib DLL), while we fix the issue.

That doesn't seem different from any other improvement we make. If we have blocking bugs, we service them.

hopperpl commented 5 months ago

Can we add support for compression/decompression via Span? This was always missing; zlib required using a stream.

This shouldn't be much of a problem as the native library already has exported functions for it.

stephentoub commented 5 months ago

Can we add support for compression/decompression via Span? This was always missing; zlib required using a stream.

This shouldn't be much of a problem as the native library already has exported functions for it.

This is unrelated to which zlib implementation is used. It is tracked by https://github.com/dotnet/runtime/issues/39327.

ericstj commented 5 months ago

And https://github.com/dotnet/runtime/issues/44793.

This issue is only about swapping the implementation of zlib.

But there are actual and significant functional differences between both NLS and ICU and between ICU versions (with different data sets). I would not expect that to be the case with zlib; what specifically are we concerned about?

Observable behavior. Size and content compressed payload. Maybe we'd want such a switch if we serviced this change in-place in 8.0 - but since we're doing in a major release we just accept this change as part of the churn.

stephentoub commented 5 months ago

since we're doing in a major release we just accept this change as part of the churn.

+1. We have such considerations any time we patch our existing zlibs, and we had one commensurate with the move to zlib-ng when we moved to zlib-intel.

carlossanlop commented 5 months ago

Thanks @jkotas @stephentoub and @ericstj for your input. Based on the feedback, I will remove the section for compat switch.

Maybe we'd want such a switch if we serviced this change in-place in 8.0

Speaking of which - I did not add a section to backport the zlib-ng migration to servicing branches. Do we need to consider this too?

bartonjs commented 5 months ago

I did not add a section to backport the zlib-ng migration to servicing branches. Do we need to consider this too?

I believe the current plan is "no", and that by virtue of having built/redisted we're just on the hook for dealing with any problems in zlib-intel before .NET 8 goes out of support. After we see what the cost of the change is in main then tactics/servicing/etc might want to have a discussion... but we have no plan (to my knowledge) to do it proactively.

carlossanlop commented 5 months ago

@lewing @steveisok @agocke @directhex do we also want to switch mono's dependency to zlib-ng, or do we want mono to keep consuming the old zlib copy?

I ask because we have, under src/native/external, one copy of zlib, and one copy of zlib-intel, and we currently reuse these copies for coreclr and mono.

Do you see any reasons why we should keep using zlib and zlib-intel in mono, or can we proceed to replace them with zlib-ng as well?

jkotas commented 5 months ago

We should switch to zlib-ng everywhere.

lambdageek commented 5 months ago

I don't think we do anything particularly fancy with zlib. We should use whatever the rest of the repo uses. (I think we mostly use it to unpack PPDBs for the debugger). /cc @thaystg

carlossanlop commented 4 months ago

We should switch to zlib-ng everywhere.

@jkotas When you say "everywhere", do you also mean we should stop looking for an installed zlib-ng package in unix distros and always consume the native copy that we will be carrying?

carlossanlop commented 4 months ago

I was also made aware that as part of the mobile changes in runtime, we will need to update xamarin/xamarin-macios and xamarin/xamarin-android once they start targeting .NET 9. I'll add this to the work items.

jkotas commented 4 months ago

do you also mean we should stop looking for an installed zlib-ng package in unix distros and always consume the native copy that we will be carrying?

Yes.

We may want to have switch to use system zlib-ng for distro builds, similar to how we have CLR_CMAKE_USE_SYSTEM_BROTLI for Brotli. Introduced by https://github.com/dotnet/runtime/pull/66462 . cc @omajid

carlossanlop commented 4 months ago

Ok.

It seems zlib-ng is not available among the default repositories in distros like Ubuntu. So I see two options:

jkotas commented 4 months ago

It seems zlib-ng is not available among the default repositories in distros like Ubuntu.

@omajid and @dotnet/source-build-contrib should provide the guidance for this.

It is ok to do this part in a follow up change. It does not need to be part of your initial enablement PR.

omajid commented 4 months ago

AFAIK, the system copy of zlib is used by default on Linux (so no need for a CLR_CMAKE_USE_SYSTEM_ZLIB on Linux yet). On some distros like Fedora, the system version of zlib is actually zlib-ng. And zlibng is used transparently in those environments by default. On Fedora 40, it looks like this:

$ ldd /usr/lib64/dotnet/shared/Microsoft.NETCore.App/8.0.4/libSystem.IO.Compression.Native.so
        linux-vdso.so.1 (0x00007fffa36dd000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f5505c64000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5505b81000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5505994000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5505d6d000)
$ rpm -qf /lib64/libz.so.1
zlib-ng-compat-2.1.6-2.fc40.x86_64

So the scenario that we want to handle is where the system has both zlib and zlib-ng and then we want the runtime to pick zlib-ng over zlib?

carlossanlop commented 4 months ago

AFAIK, the system copy of zlib is used by default on Linux (so no need for a CLR_CMAKE_USE_SYSTEM_ZLIB on Linux yet). On some distros like Fedora, the system version of zlib is actually zlib-ng. And zlibng is used transparently in those environments by default. On Fedora 40, it looks like this:

$ ldd /usr/lib64/dotnet/shared/Microsoft.NETCore.App/8.0.4/libSystem.IO.Compression.Native.so
        linux-vdso.so.1 (0x00007fffa36dd000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f5505c64000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5505b81000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5505994000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5505d6d000)
$ rpm -qf /lib64/libz.so.1
zlib-ng-compat-2.1.6-2.fc40.x86_64

So the scenario that we want to handle is where the system has both zlib and zlib-ng and then we want the runtime to pick zlib-ng over zlib?

I had not thought about the scenario where zlib-ng is actually zlib, but yes, that's something we should consider too. And yes, if there's an installed version of zlib-ng, we want to use that one.

carlossanlop commented 3 months ago

We should switch to zlib-ng everywhere.

@jkotas @steveisok @akoeplinger @jkoritzinsky @lambdageek I wanted to ask for some additional feedback from you all regarding making zlib-ng available everywhere, as I'm having ongoing conversations with different (and sometimes opposed) opinions, and I would like to make sure we're all on the same page.

In platforms where we have a concern, we could continue consuming the system-provided zlib (like in Android and iOS). That way we don't carry a copy that could have a security issue. It's my understanding that we already do something similar with our cryptographic features.

jkotas commented 3 months ago

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

I was told there's concern that in Android/iOS, third-party libraries could also get build problems if they depend on zlib and they start seeing our new copy of zlib-ng.

This problem is not limited to mobile. It exists with native AOT on all platforms. I am not particularly worried about it. It affects only a very small fraction of our customers and there are ways to deal with it.

I can buy the argument that it is not worth the trouble to deal with zlib-ng build issues and conflicts with system zlib on mobile and browser. I would be ok with staying with system zlib on these platforms.

rolfbjarne commented 3 months ago

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

One difference on mobile platforms is that size matters much more than on other platforms.

Using system zlib is basically free size-wise.

I was told there's concern that in Android/iOS, third-party libraries could also get build problems if they depend on zlib and they start seeing our new copy of zlib-ng.

This problem is not limited to mobile. It exists with native AOT on all platforms. I am not particularly worried about it. It affects only a very small fraction of our customers and there are ways to deal with it.

I can buy the argument that it is not worth the trouble to deal with zlib-ng build issues and conflicts with system zlib on mobile and browser. I would be ok with staying with system zlib on these platforms.

The way we've dealt with this in the past, has been to rename symbols exposed by Mono (prefixed them with something like _mono_). Note that making symbols private isn't necessarily enough, because we link Mono statically, and then if a third-party static library happens to use the same symbol names, we'll still get symbol clashes.

rolfbjarne commented 3 months ago

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

Note that this is a bit worse on mobile than on desktop, because Mono is shipped with the apps:

  1. Security issue is found in zlib-ng
  2. .NET runtime (and maybe workloads) publishes an update with the security fix
  3. Every one of our customers (app developers) have to update, rebuild their apps, and republish them.
  4. Every one of our customers' customers (app consumers) would have to update the app on their device(s).

Compare to:

  1. Security issue is found in zlib
  2. Apple/Google ships a system update, fixing it for everybody.
stephentoub commented 3 months ago

How is that different from any other security issue found and fixed anywhere in the runtime, System.* libraries, etc?

jkotas commented 3 months ago

One difference on mobile platforms is that size matters much more than on other platforms. Using system zlib is basically free size-wise.

Yes, system zlib is free size-wise. On the other hand, zlib is fairly small (~100kB) so the binary footprint is not a big deal either.

Note that this is a bit worse on mobile than on desktop, because Mono is shipped with the apps

This concern is not mobile specific. It exists for self-contained applications everywhere - mobile, browser, desktop, server.

carlossanlop commented 2 months ago

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64: https://github.com/dotnet/perf-autofiling-issues/issues/38054

prayaas-a commented 1 month ago

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64: dotnet/perf-autofiling-issues#38054

Thanks @carlossanlop. I would love to see the perf comparison on Windows. Is a similar report for Windows in the works?

ericstj commented 2 weeks ago

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64: dotnet/perf-autofiling-issues#38054

Thanks @carlossanlop. I would love to see the perf comparison on Windows. Is a similar report for Windows in the works?

Have a look at this query https://github.com/dotnet/perf-autofiling-issues/issues?q=is%3Aissue+is%3Aopen+Decompress+Compress+label%3Aperf-improvement+

@carlossanlop are we tracking any remaining work with this issue?

carlossanlop commented 2 weeks ago

All items have been resolved, we can close this.