dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

GC picks wrong L3 cache size on Linux #76290

Open smoogipoo opened 1 year ago

smoogipoo commented 1 year ago

Description

In https://github.com/dotnet/runtime/issues/48937 it was found that my gen0 budget was 32MiB. Investigating this further, I believe it may even be as high as 64MiB which causes the erratic Gen0 collection times I'm seeing.

System configuration:

I wrote a simple app following the implementations that the GC uses on x86 (as far as I can tell) for both Linux and Windows:

... And found that on Windows it outputs a cache size of 16MiB and on Linux of 64MiB.

I believe the 64MiB value to be incorrectly chosen for this system given the CPU topology: image

This makes sense, since _SC_LEVEL3_CACHE_SIZE returns the total L3 size.

On other platforms, the GC queries /sys/devices/system/cpu/cpu0/cache/index*/size to determine the cache size: https://github.com/filipnavara/runtime/blob/1955928833e178392f3a40ac1509f0d4a6ca7632/src/coreclr/gc/unix/gcenv.unix.cpp#L901-L935

... Which results in more reasonable values:

$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index2/size
512K
$ cat /sys/devices/system/cpu/cpu0/cache/index3/size
16384K

Reproduction Steps

I'm not sure how to extract the Gen0 budget from the GC, so I wrote an app that uses the same method as the GC to determine cache size: https://github.com/smoogipoo/CacheSizeTest

It can be run on Windows and Linux.

Must be run with a multi-CCX CPU such as Ryzen 3950x.

Expected behavior

The cache size on Linux should be 16MiB.

Actual behavior

The cache size on Linux is 64MiB.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
### Description In https://github.com/dotnet/runtime/issues/48937 it was found that my gen0 budget was 32MiB. Investigating this further, I believe it may even be as high as 64MiB which causes the erratic Gen0 collection times I'm seeing. System configuration: - AMD Ryzen 3950x I wrote a simple app following the implementations that the GC uses on x86 (as far as I can tell) for both Linux and Windows: - Linux: https://github.com/filipnavara/runtime/blob/1955928833e178392f3a40ac1509f0d4a6ca7632/src/coreclr/gc/unix/gcenv.unix.cpp#L880-L899 - Windows: https://github.com/filipnavara/runtime/blob/1955928833e178392f3a40ac1509f0d4a6ca7632/src/coreclr/gc/windows/gcenv.windows.cpp#L405-L435 ... And found that on Windows it outputs a cache size of ~16MiB and on Linux of ~64MiB. I believe the 64MiB value to be incorrectly chosen for this system given the CPU topology: ![image](https://user-images.githubusercontent.com/1329837/192696846-8b4dfd88-da7f-4a27-a682-22e5102875c7.png) This makes sense, since `_SC_LEVEL3_CACHE_SIZE` returns the total L3 size. On other platforms, the GC queries `/sys/devices/system/cpu/cpu0/cache/index-*/size` to determine the cache size: https://github.com/filipnavara/runtime/blob/1955928833e178392f3a40ac1509f0d4a6ca7632/src/coreclr/gc/unix/gcenv.unix.cpp#L901-L935 ... Which results in more reasonable values: ``` $ cat /sys/devices/system/cpu/cpu0/cache/index0/size 32K $ cat /sys/devices/system/cpu/cpu0/cache/index1/size 32K $ cat /sys/devices/system/cpu/cpu0/cache/index2/size 512K $ cat /sys/devices/system/cpu/cpu0/cache/index3/size 16384K ``` ### Reproduction Steps I'm not sure how to extract the Gen0 budget from the GC, so I wrote an app that uses the same method as the GC to determine cache size: https://github.com/smoogipoo/CacheSizeTest It can be run on Windows and Linux. Must be run with a multi-CCX CPU such as Ryzen 3950x. ### Expected behavior The cache size on Linux should be 16MiB. ### Actual behavior The cache size on Linux is 64MiB. ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration _No response_ ### Other information _No response_
Author: smoogipoo
Assignees: -
Labels: `area-GC-coreclr`
Milestone: -
EgorBo commented 1 year ago

Yes, it's a known problem that we have with the current design - we query L3 size but we don't check its structure whether it's shared across all CPUs or CPU groups or even per-core, related: https://github.com/dotnet/runtime/pull/75881

The problem that /sys/devices/system/cpu/cpu0/cache/ is fairly unreliable depending on arch/cpu/OS version (e.g. we had an issue on arm64 where it wasn't reported at all) so it's difficult to come up with something stable and avoid regressions

mangod9 commented 1 year ago

@janvorli

mangod9 commented 2 weeks ago

Hey @janvorli, assume this can be moved to 10?

EgorBo commented 2 weeks ago

Isn't no longer relevant with DATAS enabled by default?