dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Pauseless Garbage Collector (Question) #96213

Open jogibear9988 opened 9 months ago

jogibear9988 commented 9 months ago

Is there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet?

https://www.azul.com/products/components/pgc/ https://www.artima.com/articles/azuls-pauseless-garbage-collector

ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
Is there any reason why something like the Pauseless Garbage Collector wich exists for Java from Azul never was implemented for Dotnet? https://www.azul.com/products/components/pgc/ https://www.artima.com/articles/azuls-pauseless-garbage-collector
Author: jogibear9988
Assignees: -
Labels: `question`, `area-GC-coreclr`, `untriaged`
Milestone: -
AlgorithmsAreCool commented 9 months ago

I'll add the generic response that the .NET GC supports functionality that Java's GCs do not and this can complicate or invalidate optimizations that other GCs can take advantage of.

For examples, Java's GC doesn't support interior pointers while the .NET GC does

huoyaoyuan commented 9 months ago

I'd like to see the GC team to write some insight about benefits and challenges/drawbacks of these options. What's theoretically possible but just complex/low priority for implementation? Which features/goals have fundamentally conflict?

jkotas commented 9 months ago

These types of GCs typically trade throughput for shorter pause times. For example, they often use GC read barriers that make accessing object reference fields significantly slower. If you would like to understand the problem space, read the The Garbage Collection Handbook. It has a full chapter dedicated to real-time garbage collectors.

There is nothing fundamental preventing building these types of garbage collectors for .NET. It is just a lot of work to build a production quality garbage collector. We do not see significant demand for these types of garbage collectors in .NET. Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I would love to see .NET community experimenting with alternative garbage collectors with very different performance tradeoffs. It is how Azul came to be - Azul's garbage collector that you have linked to is not built by the core Java team.

MichalPetryka commented 9 months ago

Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

jogibear9988 commented 9 months ago

Building alternative garbage collectors with very different performance tradeoffs has not been at the top of the core .NET team priority list.

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

Yeah, but a pauseless collector for me seams to open a whole new area where .NET could be used. Even if it may be slower, but deterministic, without pauses every now and then, for some applications this could be a huge benefit.

jkotas commented 9 months ago

I guess that having more and more official, simultaneously supported GCs would also increase the amount of work needed to maintain and improve them all, putting even more burden on the GC team which would mean that the existing GCs would be improved slower.

Right. If it was to follow the Azul model, it would not impact the core GC team much. I believe that the core Java GC team does not spend any cycles on the Azul GC. The Azul GC is maintained by Azul that is a company with a closed source business model.

Yeah, but a pauseless collector for me seams to open a whole new area where .NET could be used.

It comes down to numbers and opportunity costs. For example, how many new developers can pauseless GC bring to .NET? It is hard to make the numbers work.

filipnavara commented 9 months ago

If you would like to understand the problem space, read the The Garbage Collection Handbook.

For what's it worth, beware that buying the book as eBook from the official publisher only gives access to the book through the VitalSource service. There is no way to download the book except through the DRM encumbered software, and they managed to block my account before I was even able to read a single page (no explanation given, the service just responds with 401 error and logs me out).

If you want to get the book, get it as a physical book or through Amazon Kindle and save yourself the trouble.

fabianoliver commented 5 months ago

We do not see significant demand for these types of garbage collectors in .NET

I am rather surprised to hear that!

I'd love to see an experimental GC, so I'm certainly quite biased. But I'd imagine predictability of latency is a very significant concern for a number of large user bases. Game development (Unity) comes to mind of course. As do many areas in finance & algorithmic trading. In my experience, current GC characteristics are often a dealbreaker for these users. So unlikely many of the recent improvements to C# features and the runtime, which without a doubt greatly enhance the experience for existing .NET users (not that I'm complaining! :) ), I would imagine something like pauseless GC has much higher potential to bring in new developers that so far couldn't realistically choose .NET at all.

Sorry, that's it for my sales pitch; but in short, I'd definitely love to see experimentation in this area.

tannergooding commented 5 months ago

Latency of a GC can be a concern in many of the same ways that latency of RAII can be a concern.

Having a GC, including a GC that can "stop the world", is not itself strictly a blocker and it may be of interest to note that many of the broader/well known game engines do themselves use GCs (many, but not all, of which are incremental rather than pauseless).

Most people's experience with .NET and a GC in environments like game dev, up until this point, has been with either the legacy Mono GC or the Unity GC, neither of which can really be compared with the performance, throughput, latency, or various other metrics of the precise GC that ships with RyuJIT.

Having some form of incremental GC is likely still interesting, especially if it can be coordinated to run more so in places where the CPU isn't doing "important" work (such as when you're awaiting a dispatched GPU task to finish executing), but its hardly a requirement with an advanced modern GC, especially if you're appropriately taking memory management into consideration by utilizing pools, spans/views, and other similar techniques (just as you'd have to use in C++ to limit RAII or free overhead).

sgf commented 4 months ago

In essence, using a pool is no different from manually allocating memory. It does not reduce the mental burden of manual management required to allocate and reclaim memory. Although .net gc can still cope with it, with the increasing memory usage, improvement is imperative. At present,about low latency control, .net's GC has no advantage over jvm's gc in large memory management.It's also not good as the go 's gc . Throughput is important, but in cases where latency is sensitive, STW will directly limit the areas where .net can be used, such as server side of large-scale online games.

Of course, it is also necessary to explore safe programming methods similar to rust. If the memory usage is too much, the GC will be overwhelmed.

This is a popular article about GC. Go vs C#, part 2: Garbage Collection

jogibear9988 commented 4 months ago

In essence, using a pool is no different from manually allocating memory. It does not reduce the mental burden of manual management required to allocate and reclaim memory. Although .net gc can still cope with it, with the increasing memory usage, improvement is imperative. At present,about low latency control, .net's GC has no advantage over jvm's gc in large memory management.It's also not good as the go 's gc . Throughput is important, but in cases where latency is sensitive, STW will directly limit the areas where .net can be used, such as server side of large-scale online games.

Of course, it is also necessary to explore safe programming methods similar to rust. If the memory usage is too much, the GC will be overwhelmed.

This is a popular article about GC. Go vs C#, part 2: Garbage Collection

Would be nice to see how this changes in newer versions of .NET and also how JAVA compares against (with the default and the here mentioned pauseless collector)

hez2010 commented 4 months ago

Maybe an option like Incremental GC which is being adopted by Unity is feasible here, where it breaks up a "full GC" into several "partial GC" sequence (i.e. doing GC incrementally), so that although the total pausing time doesn't change, each pausing time of a single GC can be minimized to a nearly pauseless one. This should fit the need of the applications which require soft real-time. We allow pausing in GC, it may not need to be real "pauseless", but for apps like games and real-time services, we can make sure the pausing time of each single GC be short enough.

cc: @Maoni0

smoogipoo commented 3 months ago

I'm a game/engine dev on osu!. We've used C# throughout all of .NET 3.5 to .NET 8, and have fully rewritten the game over the years which has brought new challenges in terms of balancing features that wouldn't have been possible prior and what works best with the .NET GC. By far our greatest fight has been with the GC - it is definitely a felt presence and at the forefront of everything we do.

I've personally gone pretty deep in minimising pauses with issues such as https://github.com/dotnet/runtime/issues/48937, https://github.com/dotnet/runtime/issues/12717, and https://github.com/dotnet/runtime/issues/76290, but as a team we've always been very conscious about allocations because our main loop is running at potentially 1000Hz, or historically even more than that.
We'll regularly profile the game to find any and all areas where we can reduce allocations, where even issues like https://github.com/dotnet/runtime/issues/40009 and https://github.com/dotnet/runtime/issues/33747 have caused problems because even a tiny hitch of ~5ms could be noticeable in today's world of 240+Hz monitors.

What we've found works best for us is turning on LowLatency GC mode during our core gameplay session. This is two-part:

Where it breaks down, however, is areas that require allocs such as menus. This GC mode will cause terrible stutters when doing anything remotely intensive, meaning that we have to very carefully switch GC modes at opportune moments to get the best of both worlds, and sometimes those worlds are intertwined.
The best we've found in menus is SustainedLowLatency - even the default Interactive mode is a little bit too heavy-handed - at the cost of a single large stutter every once in a while.
There's some further nuance here because of the cascading failure where more pressure leads to more GCs, leads to more promotions to Gen1, leads to longer GCs, leads to more data being promoted to Gen2 and longer BGCs, which is likely how https://github.com/dotnet/runtime/issues/65850 and the suggestion of a "generationless GC" came about, but nevertheless...

julealgon commented 3 months ago

@smoogipoo it seems to me that you should be working directly with MS folks on this. Your expertise on gamedev would help so many people out.

Stuttering in Unity for example has been a blemish on C# for a very long time. It gives people the impression C# is just a bad language which is absolutely disastrous to the community as a whole as more people move away from these tools and end up using other languages.

georgiuk commented 3 weeks ago

Industrial automation and real-time data fusion are sectors that grow a lot and would require more deterministic behavior/latency.

fawdlstty commented 1 week ago

I hope C# can enter the field of industrial robot control, but this field requires hard real-time performance, and long interruption times may lead to safety accidents. It would be great if users could create real-time threads that are not affected by GC

fawdlstty commented 1 week ago

Linus has merged the final real-time code (PREEMPT_RT) into the Linux mainline, and for programming languages, supporting hard real-time will be one of the most outstanding features of the language!