avolmat / linux

Linux kernel source tree
Other
2 stars 2 forks source link

Kernel does not expose L2$ topology #3

Open waby38b opened 1 year ago

waby38b commented 1 year ago

Using this commit from my pull request , L1$ topology is exposed by kernel

Now, we cannot see anything related to L2$ This need to be tuned in device tree

HWLOC is a project which can expose such topology

$ /usr/bin/hwloc-ls
Machine (1986MB total)
  Package L#0
    NUMANode L#0 (P#0 1986MB)
    L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
  Block "mmcblk0"
  Net "eth0"
avolmat commented 1 year ago

This commit has never been merged within the upstream kernel. Instead of that, there is already support within drivers/base/cacheinfo to pull the cache topology from the devicetree for both L1, L2 etc. As an example, arch/arm/boot/dts/bcm2711.dtsi describe the topology of all L1 and L2 cache. Wouldn't it be enough to do it in a similar way ?

avolmat commented 1 year ago

Another example of such cache topology settings:

https://github.com/avolmat/linux/commit/b2d5025e129289d9b914c696646e64495a7453c0

waby38b commented 1 year ago

For L1$, here what I can get from /sys/devices/system/cpu/

$ for i in $(find cpu*/cache/index* -type f); do echo $i=$(cat $i); done

# Data
cpu1-4/ways_of_associativity=4
             /allocation_policy=ReadWriteAllocate
             /shared_cpu_list=0-3
             /shared_cpu_map=1-2-4-8
             /type=Data
             /write_policy=WriteBack
             /size=32K
             /coherency_line_size=32
             /level=1
             /number_of_sets=256

# Instructions
cpu1-4//ways_of_associativity=4
             /allocation_policy=ReadAllocate
             /shared_cpu_list=0-3
             /shared_cpu_map=1-2-4-8
             /type=Instruction
             /size=32K
             /coherency_line_size=32
             /level=1
             /number_of_sets=256

And for L2$, For now, we just have

[    0.000000] L2C-310 cache controller enabled, 8 ways, 1024 kB

So, STi418 SoC has 32KB Icache, 32KB Dcache and 1MB L2 Cache

32KB (Dcache)/32 (fixed line length of 32 bytes) = 1024 ways 1024 ways / 4 (Dcache is 4-way per set) = 256 sets.

32KB (Icache)/32 (fixed line length of 32 bytes) = 1024 ways 1024 ways / 4 (Icache is 4-way per set) = 256 sets.

i-cache-size = <0x8000>;
i-cache-line-size = <32>;
i-cache-sets = <256>;

d-cache-size = <0x8000>;
d-cache-line-size = <32>;
d-cache-sets = <256>;

And for L2$ part,

1024KB (L2C-310 cache controller)/32 (fixed line length of 32 bytes) = 32768 ways 32768 ways / 8 (L2C-310 cache controller is 8-way per set) = 4096 sets.

cache-level = <2>;
cache-size = <0x100000>;
cache-line-size = <32>;
cache-sets = <4096>;

@avolmat, Any comments ?

waby38b commented 1 year ago

First patch (broken ?)

Now, kernel logs show something different (but OK)

[    0.000000] L2C: platform modifies aux control register: 0x02080000 -> 0x30480000
[    0.000000] L2C OF: override cache size: 1048576 bytes (1024KB)
[    0.000000] L2C OF: override line size: 32 bytes
[    0.000000] L2C OF: override way size: 131072 bytes (128KB)
[    0.000000] L2C OF: override associativity: 8
[    0.000000] L2C: DT/platform modifies aux control register: 0x02080000 -> 0x30480000
[    0.000000] L2C: DT/platform tries to modify or specify cache size
[    0.000000] L2C-310 erratum 769419 enabled
[    0.000000] L2C-310 enabling early BRESP for Cortex-A9
[    0.000000] L2C-310 full line of zeros enabled for Cortex-A9
[    0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled
[    0.000000] L2C-310 cache controller enabled, 8 ways, 1024 kB
[    0.000000] L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x44480001

But, still no L2$ information.... sniff...

$ /usr/bin/hwloc-ls
Machine (1972MB total)
  Package L#0
    NUMANode L#0 (P#0 1972MB)
    L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
  Block "mmcblk0"
  Net "wlan0"
  Net "eth0"

Update 1 hum, in fact this is worst.... Previously, I forgot to remove L1$ drivers ("ARM: kernel: add support for cpu cache information") Now, I just get this....

$ /usr/bin/hwloc-ls
Machine (1972MB total)
  Package L#0
    NUMANode L#0 (P#0 1972MB)
    Core L#0 + PU L#0 (P#0)
    Core L#1 + PU L#1 (P#1)
    Core L#2 + PU L#2 (P#2)
    Core L#3 + PU L#3 (P#3)
  Net "wlan0"
  Net "eth0"

Where I miss something ?

waby38b commented 1 year ago

After code analysis:

L1$ topic

i-cache-size = <0x8000>;
i-cache-line-size = <32>;
i-cache-sets = <256>;
d-cache-size = <0x8000>;
d-cache-line-size = <32>;
d-cache-sets = <256>;

These lignes are never parsed except if arch/arm/kernel/cacheinfo.c is present (aka L1$ patch) => ./drivers/base/cacheinfo.c (cache framework)

  int __weak init_cache_level(unsigned int cpu)
  int __weak populate_cache_leaves(unsigned int cpu)

=> implemented in arch specific files

./arch/mips/kernel/cacheinfo.c
./arch/powerpc/kernel/cacheinfo.c
./arch/arm64/kernel/cacheinfo.c
./arch/x86/kernel/cpu/cacheinfo.c
./arch/riscv/kernel/cacheinfo.c
./arch/loongarch/kernel/cacheinfo.c

So, for ARM without L1$ patch, just useless....

L2$ topic

Not really clear for now....