dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

Increase in unmanaged memory growth when upgrading from .NET 6 to 8 #109933

Open seanamorosoamtote opened 1 week ago

seanamorosoamtote commented 1 week ago

Description

Issue is very similar to https://github.com/dotnet/runtime/issues/95922 however we have upgraded all the way to the current release of 8.0 as of 11/18/2024 which is 8.0.10. Noticed that we started getting OOMKilled on our deployments with linux/amd64 using the base image of mcr.microsoft.com/dotnet/sdk:8.0.11. We had defaulted our memory limits to 512Mi which was sufficient for these services in dotnet 6. Now we have to update to the memory limit to 1Gi which seems to be sufficient but still can't understand where the waste is going.

Analysis seems to show that a larger deal of unmanaged memory is being reserved and not garbage collected or freed.

Configuration

Deployed in Google Cloud Kubernetes using base image of mcr.microsoft.com/dotnet/sdk:8.0.11

Regression?

Maybe? If nothing else, our application requires more base memory than it did in dotnet 6 and being able to understand why would be helpful.

Data

Prior to analyzing we would deploy with 512Mi and once our application would start the integration test phase where it would be "live" it would then get OOMKilled. Again, putting it up to 1Gi seems to resolve the issue but it seems odd that we would have to increase this just for dotnet 8.

Heaptrack analysis (notice the peak RSS is 1.1GB): Image

There is some leakage here but nothing that seems to point to 1GB: Image

Having an issue attaching the heaptrack log itself. I'll try after this is created to do so again.

WinDbg shows a large amount in the PAGE_READWRITE

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unknown>                              4081     7fff`f560ee00 ( 128.000 TB) 100.00%  100.00%
Image                                   465        0`0a902200 ( 169.008 MB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
                                        938     7fff`a5346000 ( 127.999 TB)          100.00%
MEM_PRIVATE                            3608        0`5abcb000 (   1.418 GB)   0.00%    0.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
                                        938     7fff`a5346000 ( 127.999 TB) 100.00%  100.00%
MEM_COMMIT                             3608        0`5abcb000 (   1.418 GB)   0.00%    0.00%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                         1741        0`4b21a000 (   1.174 GB)   0.00%    0.00%
PAGE_READONLY                           550        0`08c63000 ( 140.387 MB)   0.00%    0.00%
PAGE_EXECUTE_READ                      1317        0`06d4e000 ( 109.305 MB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown>                                 0`00000000     55c4`13519000 (  85.766 TB)
Image                                  7c86`6c522000        0`01dcf000 (  29.809 MB)

In WinDbg, I also looked at the largest items on the heap so this seems to point to more of a native memory issue?

55c41473fb20     964  1,267,128 Free
7c87a090bf68   7,117  1,457,672 System.Object[]
7c87a24aa700  20,236  1,618,880 System.Signature
7c87a0909db8  58,446  2,337,840 System.RuntimeType
7c87a228b390  26,386  2,744,144 System.Reflection.RuntimeMethodInfo
7c87a902cee8  17,787  2,988,216 Google.Protobuf.Reflection.FieldDescriptorProto
7c87a18e1210  22,855  4,996,152 System.Byte[]
7c87a09bd7c8 181,517 16,487,958 System.String
Total 970,322 objects, 70,404,912 bytes
dotnet-policy-service[bot] commented 11 hours ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

mangod9 commented 10 hours ago

To check if it's related to the new GC in 8.0, can you try with DOTNET_GCName=libclrgc.so to determine if with that usage stays closer to 6?