dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

GC memory commit issue with currently not supported OS (Nintendo Switch) #41675

Open RalfKornmannEnvision opened 4 years ago

RalfKornmannEnvision commented 4 years ago

I am in the process of getting CoreRT running on the Nintendo Switch. Most thins are already working. While getting the background GC working I run in a little issue with the way the GC commits the memory for the background GC mark array. The mark array is reserved together with other management data but the actual commit is postponed until the first background GC is running. This triggers a second commit on the same reserve. But as the mark array is not aligned to a page boundary the first page of this second commit overlaps with the last page that was already committed. The GC code now expect that any additional page is zeroed but the first one should not be touched.

Unfortunately the virtual memory manager I can use is not smart enough for this. Therefore it either let additional pages uninitialized or clears everything.

For performances reasons I would prefer to not implement a tracking for the already committed pages on top of the virtual memory manager. 

A possible fix that would help me could be if the GC avoid this overlapping commit for the background gc mark array. This could be done very easily in the gc_heap::commit_mark_array_by_range function. I already prototyped it on my system and it allows me to get the background GC running even with a virtual memory manager that doesn't behave that nice when it comes to multi commits of the same page.

ghost commented 4 years ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

mangod9 commented 4 years ago

@RalfKornmannEnvision would you mind linking the PR of the prototype? Assume you are asking for a fix in CoreRT, not .net core?

jkotas commented 4 years ago

CoreRT/NativeAOT is using the exact same GC as CoreCLR. We do not want the implementation to diverge. Also, we want all GC fixes to be submitted via dotnet/runtime repo.

The root cause of the issue is that GC commits same region of memory multiple times today. The Windows and Linux memory managers deal with it gracefully, but the Switch memory manager does not.

Can the GC be fixed so that it always commits given region exactly once?

mangod9 commented 4 years ago

CoreRT/NativeAOT is using the exact same GC as CoreCLR. We do not want the implementation to diverge. Also, we want all GC fixes to be submitted via dotnet/runtime repo.

Ok, thanks for the clarification.

RalfKornmannEnvision commented 4 years ago

Created a draft PR (https://github.com/dotnet/runtime/pull/41728) with the small change that would be needed to fix this issue.

All it does is changing the start page for the commit of the mark array. So far it always committed starting with the page that contains at least a part of the mark array. The change ensure that it starts with the first page that only contains mark array data. The page that contains although other data has always been already committed at this point as part of the initial commit that happens directly after the memory has been reserved. This initial reserve only exclude the full mark array. There will be nothing behind the mark array therefore we don't need to considerate that the last page of the mark array might be already committed, too. The edge case here is when the mark array actually starts on a page boundary. In this case there will be no page that contains  mixed data. The new align_upper_page helper function ensures that the address will not change if it is already on the start of a page. Therefore the whole mark array will be committed in the edge case, too.

After the commit the function runs a check (in the debug version) to see if the whole mark array has been zeroed. This check will although validate that all pages of the mark array have been committed. 

RalfKornmannEnvision commented 4 years ago

I take it back. It seems I have overlooked something here. Beside of a full commit of the whole mark array there seems to be a use case were it is only partially committed. In this case my whole approach to fix this issue break down.

Maoni0 commented 4 years ago

even without partial commit it doesn't work - you could have a new segment that's fully in range and is close enough to an existing fully in range segment that it shares the first page of its mark array with the mark array for the existing segment.

you can make this work by looking to see if there's a seg that would share such a page with the range you want to commit and if that page has been committed based on the old BGC range.

what kind of apps do you run on the switch that uses corert? do you actually need BGC? if these apps have very small heaps you can just disable BGC altogether.

RalfKornmannEnvision commented 4 years ago

I am aware that I can disable the BGC. Actually I just have activated it by implementing FEATURE_MANUALLY_MANAGED_CARD_BUNDLES and FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP for CoreRT on ARM64. This was when I noticed the issue. 

We might get away without the BGC but as the long term plan is to get other Switch developer access to the sources, too there might be people who are more in need.  But for now it might be the best to disable the BGC until there is a better solution

Sorry for wasting your time.

mangod9 commented 4 years ago

Thanks @RalfKornmannEnvision for raising this issue. We will leave this issue on the backlog, but given that you have a workaround for now, it might be a lower priority. If BGC need arises again please feel free to contribute a change and we will evaluate it for .net 6. Thx!

Maoni0 commented 4 years ago

@RalfKornmannEnvision no need to apologize! and thanks for bringing to this to our attention.