dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.87k stars 4.63k forks source link

allow custom memory allocator - mimalloc (linux,musl,glibc) #60467

Open dufkaf opened 2 years ago

dufkaf commented 2 years ago

We are running lot of microservices in AKS cluster and tried Alpine Linux .NET Core docker images instead of Debian due to its size and runtime memory requirements. We see better RAM utilization - typical small aspnet microservice that consumes 200-400MB with debian/glibc consumes 100-250MB with alpine/musl. However we also noticed performance degradation issue with multithreaded memory heavy workloads. We see 30-40% slowdowns when compared to debian/glibc. We noticed it on service with 4-5GB in memory data set where REST API queries this data while the data set is updated on the background.

Looks like it is known issue of musl libc memory allocator - both old and new introduced recently.

https://www.linkedin.com/pulse/testing-alternative-c-memory-allocators-pt-2-musl-mystery-gomes https://news.ycombinator.com/item?id=23081071 - discussion about its features notably no per thread heaps

As musl is also used in memory constrained/embedded environments, the features of the allocator (concurrency issues) is more a design choice than a bug.

There is custom memory allocator https://github.com/microsoft/mimalloc with quite positive reviews overall that tries to solve such issues.

Would it make sense to support mimalloc on Alpine Linux (or even Debian/Ubuntu or Windows) as part of .net runtime build to replace default OS allocator?

Since there is garbage collector and its own memory handling in CLR that may possibly be tuned at build time to musl/glibc, is it worth even trying to just blindly LD_PRELOAD mimalloc on Alpine or Debian when starting .net core app without any support in the runtime?

Alpine Linux is one of three .NET Linux docker images supported and published and is promoted by members of .NET Team as a better choice (e.g. here https://devblogs.microsoft.com/dotnet/staying-safe-with-dotnet-containers/ "We also recommend Alpine because ..." ) however looks like at least for us the current performance of Alpine is lacking when compared to more heavy Linux docker images.

Feel free to change type of this issue, "performance" felt like best fit.

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
We are running lot of microservices in AKS cluster and tried Alpine Linux .NET Core docker images instead of Debian due to its size and runtime memory requirements. We see better RAM utilization - typical small aspnet microservice that consumes 200-400MB with debian/glibc consumes 100-250MB with alpine/musl. However we also noticed performance degradation issue with multithreaded memory heavy workloads. We see 30-40% slowdowns when compared to debian/glibc. We noticed it on service with 4-5GB in memory data set where REST API queries this data while the data set is updated on the background. Looks like it is known issue of musl libc memory allocator - both old and new introduced recently. https://www.linkedin.com/pulse/testing-alternative-c-memory-allocators-pt-2-musl-mystery-gomes https://news.ycombinator.com/item?id=23081071 - discussion about its features notably no per thread heaps As musl is also used in memory constrained/embedded environments, the features of the allocator (concurrency issues) is more a design choice than a bug. There is custom memory allocator https://github.com/microsoft/mimalloc with quite positive reviews overall that tries to solve such issues. Would it make sense to support mimalloc on Alpine Linux (or even Debian/Ubuntu or Windows) as part of .net runtime build to replace default OS allocator? Since there is garbage collector and its own memory handling in CLR that may possibly be tuned at build time to musl/glibc, is it worth even trying to just blindly LD_PRELOAD mimalloc on Alpine or Debian when starting .net core app without any support in the runtime? Alpine Linux is one of three .NET Linux docker images supported and published and is promoted by members of .NET Team as a better choice (e.g. here https://devblogs.microsoft.com/dotnet/staying-safe-with-dotnet-containers/ "We also recommend Alpine because ..." ) however looks like at least for us the current performance of Alpine is lacking when compared to more heavy Linux docker images. Feel free to change type of this issue, "performance" felt like best fit.
Author: dufkaf
Assignees: -
Labels: `tenet-performance`, `area-GC-coreclr`, `untriaged`
Milestone: -
Maoni0 commented 2 years ago

GC will not use a malloc allocator to acquire memory. chatted with @janvorli a bit about this and he can comment on the malloc side of things.

janvorli commented 2 years ago

The malloc stuff is used only for native allocations in the runtime and the 3rd party libraries we use (like openssl etc.). These can be happening on multiple threads concurrently, so it seems that it would be worth trying to preload the mimalloc and see if it makes things better. GC memory is unrelated, as it uses mmap directly to allocate memory.

dufkaf commented 2 years ago

Thank you for confirming that GC memory does not use OS provided malloc. However that raises more questions than answers for me. So this means that when our typical small c# aspnetcore webapi service is started and is listening for requests, with some data cached in memory (aggregated from other services via webapi calls) - all c# code, and total memory allocated for alpine/musl vs glibc/debian based aspnetcore runtime is e.g. 130MB vs 250MB. Then since you say GC does not depend on OS malloc and I have here difference of 120MB out of 250 - that would mean that majority of memory (150-200MB) is actually native allocations? Interesting, what these would be?

Also with that big service with 5GB RAM in total (deep hierarchical tree structures of c# objects in memory) where background threads are updating it while clients query it via webapi calls - when we see the 40% slowdown for musl vs glibc (both in background jobs and client response times) these are again caused by some native allocations outside of those c# objects? That is quite surprising.

So I guess we would need to do some memory dumps to see where the memory goes and do some cpu profiling. Anyway, thank you, so if these are really all native allocations that make the difference then the preloaded mimalloc could help with those.

jkotas commented 2 years ago

I have here difference of 120MB out of 250 - that would mean that majority of memory (150-200MB) is actually native allocations?

This may or may not be the case. You may be seeing some second order effects from glibc vs. musl differences. As you have suggested, the best way to figure this out is by doing some memory and cpu profiling. Checking performance counters using https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-counters tool is a quick easy way to start.