RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[BUG] Topology of Fujitsu A64FX (Fugaku, FX1000) parsed incorrectly due to inactive HWThreads #565

Closed ficstamas closed 10 months ago

ficstamas commented 10 months ago

Describe the bug The number of PUs do not match the actual number of PUs in the last NUMA region due to the offset caused by inactive PUs. Included the actual layout at the end of the issue, but in genral the problem is that the FX1000 has 2+48 (=50) PUs where the first 2 thread is just assistant threads. The acttual working threads are indexed from 12-59 but all of the counting and indexing variables are set to the number of online PUs (50). So likwid does not list/collect anything after PU#49.

In my opinion this is the problematic code segment introduced in #447 (if I'm not wrong).

I made a temporary fix by hard coding cpuid_topology.numHWThreads = 60; after line 354 which solves the problem (seemingly). Maybe a more elegant solution can be if parse_cpuinfo sets count to the highest processor ID + 1 instead of the number of entries. Then maybe you can just delete this part and set cpuid_topology.numHWThreads = count;

To Reproduce

To Reproduce with a LIKWID command Please supply the output of the command with -V 3 added to the command:

DEBUG - [hwloc_init_cpuInfo:375] HWLOC CpuInfo Family 8 Model 1 Stepping 0 Vendor 0x46 Part 0x1 isIntel 0 numHWThreads 50 activeHWThreads 50
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 0 Thread 0 Core 12 Die 0 Socket 0 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 1 Thread 0 Core 12 Die 0 Socket 1 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 2 Thread 0 Core 0 Die 0 Socket 2 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 3 Thread 0 Core 0 Die 0 Socket 3 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 4 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 5 Thread 0 Core 1 Die 0 Socket 1 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 6 Thread 0 Core 1 Die 0 Socket 2 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 7 Thread 0 Core 1 Die 0 Socket 3 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 8 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 9 Thread 0 Core 2 Die 0 Socket 1 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 10 Thread 0 Core 2 Die 0 Socket 2 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 11 Thread 0 Core 2 Die 0 Socket 3 inCpuSet 0
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 12 Thread 0 Core 0 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 13 Thread 0 Core 1 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 14 Thread 0 Core 2 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 15 Thread 0 Core 3 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 16 Thread 0 Core 4 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 17 Thread 0 Core 5 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 18 Thread 0 Core 6 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 19 Thread 0 Core 7 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 20 Thread 0 Core 8 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 21 Thread 0 Core 9 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 22 Thread 0 Core 10 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 23 Thread 0 Core 11 Die 0 Socket 4 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 24 Thread 0 Core 0 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 25 Thread 0 Core 1 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 26 Thread 0 Core 2 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 27 Thread 0 Core 3 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 28 Thread 0 Core 4 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 29 Thread 0 Core 5 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 30 Thread 0 Core 6 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 31 Thread 0 Core 7 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 32 Thread 0 Core 8 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 33 Thread 0 Core 9 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 34 Thread 0 Core 10 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 35 Thread 0 Core 11 Die 0 Socket 5 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 36 Thread 0 Core 0 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 37 Thread 0 Core 1 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 38 Thread 0 Core 2 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 39 Thread 0 Core 3 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 40 Thread 0 Core 4 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 41 Thread 0 Core 5 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 42 Thread 0 Core 6 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 43 Thread 0 Core 7 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 44 Thread 0 Core 8 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 45 Thread 0 Core 9 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 46 Thread 0 Core 10 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 47 Thread 0 Core 11 Die 0 Socket 6 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 48 Thread 0 Core 0 Die 0 Socket 7 inCpuSet 1
DEBUG - [proc_init_nodeTopology:713] PROC Thread Pool PU 49 Thread 0 Core 1 Die 0 Socket 7 inCpuSet 1
DEBUG - [affinity_init:539] Affinity: Socket domains 8
DEBUG - [affinity_init:541] Affinity: CPU die domains 8
DEBUG - [affinity_init:546] Affinity: CPU cores per LLC 12
DEBUG - [affinity_init:549] Affinity: Cache domains 8
DEBUG - [affinity_init:553] Affinity: NUMA domains 8
DEBUG - [affinity_init:554] Affinity: All domains 33
DEBUG - [affinity_addNodeDomain:370] Affinity domain N: 38 HW threads on 38 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S0: 0 HW threads on 0 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S1: 0 HW threads on 0 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S2: 0 HW threads on 0 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S3: 0 HW threads on 0 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S4: 12 HW threads on 12 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S5: 12 HW threads on 12 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S6: 12 HW threads on 12 cores
DEBUG - [affinity_addSocketDomain:401] Affinity domain S7: 2 HW threads on 2 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D0: 0 HW threads on 0 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D1: 0 HW threads on 0 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D2: 0 HW threads on 0 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D3: 0 HW threads on 0 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D4: 12 HW threads on 12 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D5: 12 HW threads on 12 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D6: 12 HW threads on 12 cores
DEBUG - [affinity_addDieDomain:438] Affinity domain D7: 2 HW threads on 2 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C0: 0 HW threads on 0 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C1: 0 HW threads on 0 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C2: 0 HW threads on 0 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C3: 0 HW threads on 0 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C4: 12 HW threads on 12 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C5: 12 HW threads on 12 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C6: 12 HW threads on 12 cores
DEBUG - [affinity_addCacheDomain:474] Affinity domain C7: 2 HW threads on 2 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M0: 0 HW threads on 0 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M1: 0 HW threads on 0 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M2: 0 HW threads on 0 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M3: 0 HW threads on 0 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M4: 12 HW threads on 12 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M5: 12 HW threads on 12 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M6: 12 HW threads on 12 cores
DEBUG - [affinity_addMemoryDomain:504] Affinity domain M7: 2 HW threads on 2 cores
DEBUG - [create_lookups:295] T 0 T2C 12 T2S 0 T2D 0 T2LLC 1 T2M 0
DEBUG - [create_lookups:295] T 1 T2C 12 T2S 1 T2D 1 T2LLC 1 T2M 0
DEBUG - [create_lookups:295] T 2 T2C 0 T2S 2 T2D 2 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 3 T2C 0 T2S 3 T2D 3 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 4 T2C 1 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 5 T2C 1 T2S 1 T2D 1 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 6 T2C 1 T2S 2 T2D 2 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 7 T2C 1 T2S 3 T2D 3 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 8 T2C 2 T2S 0 T2D 0 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 9 T2C 2 T2S 1 T2D 1 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 10 T2C 2 T2S 2 T2D 2 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 11 T2C 2 T2S 3 T2D 3 T2LLC 0 T2M 0
DEBUG - [create_lookups:295] T 12 T2C 0 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 13 T2C 1 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 14 T2C 2 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 15 T2C 3 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 16 T2C 4 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 17 T2C 5 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 18 T2C 6 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 19 T2C 7 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 20 T2C 8 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 21 T2C 9 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 22 T2C 10 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 23 T2C 11 T2S 4 T2D 4 T2LLC 0 T2M 4
DEBUG - [create_lookups:295] T 24 T2C 0 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 25 T2C 1 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 26 T2C 2 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 27 T2C 3 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 28 T2C 4 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 29 T2C 5 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 30 T2C 6 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 31 T2C 7 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 32 T2C 8 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 33 T2C 9 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 34 T2C 10 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 35 T2C 11 T2S 5 T2D 5 T2LLC 0 T2M 5
DEBUG - [create_lookups:295] T 36 T2C 0 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 37 T2C 1 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 38 T2C 2 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 39 T2C 3 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 40 T2C 4 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 41 T2C 5 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 42 T2C 6 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 43 T2C 7 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 44 T2C 8 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 45 T2C 9 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 46 T2C 10 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 47 T2C 11 T2S 6 T2D 6 T2LLC 0 T2M 6
DEBUG - [create_lookups:295] T 48 T2C 0 T2S 7 T2D 7 T2LLC 0 T2M 7
DEBUG - [create_lookups:295] T 49 T2C 1 T2S 7 T2D 7 T2LLC 0 T2M 7
--------------------------------------------------------------------------------
CPU name:   
CPU type:   Fujitsu A64FX
CPU stepping:   0
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:        8
Cores per socket:   12
Threads per core:   1
--------------------------------------------------------------------------------
HWThread        Thread        Core        Die        Socket        Available
0               0             12          0          0                              
1               0             12          0          1                              
2               0             0           0          2                              
3               0             0           0          3                              
4               0             1           0          0                              
5               0             1           0          1                              
6               0             1           0          2                              
7               0             1           0          3                              
8               0             2           0          0                              
9               0             2           0          1                              
10              0             2           0          2                              
11              0             2           0          3                              
12              0             0           0          4             *                
13              0             1           0          4             *                
14              0             2           0          4             *                
15              0             3           0          4             *                
16              0             4           0          4             *                
17              0             5           0          4             *                
18              0             6           0          4             *                
19              0             7           0          4             *                
20              0             8           0          4             *                
21              0             9           0          4             *                
22              0             10          0          4             *                
23              0             11          0          4             *                
24              0             0           0          5             *                
25              0             1           0          5             *                
26              0             2           0          5             *                
27              0             3           0          5             *                
28              0             4           0          5             *                
29              0             5           0          5             *                
30              0             6           0          5             *                
31              0             7           0          5             *                
32              0             8           0          5             *                
33              0             9           0          5             *                
34              0             10          0          5             *                
35              0             11          0          5             *                
36              0             0           0          6             *                
37              0             1           0          6             *                
38              0             2           0          6             *                
39              0             3           0          6             *                
40              0             4           0          6             *                
41              0             5           0          6             *                
42              0             6           0          6             *                
43              0             7           0          6             *                
44              0             8           0          6             *                
45              0             9           0          6             *                
46              0             10          0          6             *                
47              0             11          0          6             *                
48              0             0           0          7             *                
49              0             1           0          7             *                
--------------------------------------------------------------------------------
Socket 0:       ( 4 8 0 )
Socket 1:       ( 5 9 1 )
Socket 2:       ( 2 6 10 )
Socket 3:       ( 3 7 11 )
Socket 4:       ( 12 13 14 15 16 17 18 19 20 21 22 23 )
Socket 5:       ( 24 25 26 27 28 29 30 31 32 33 34 35 )
Socket 6:       ( 36 37 38 39 40 41 42 43 44 45 46 47 )
Socket 7:       ( 48 49 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:          1
Size:           64 kB
Cache groups:       ( 4 ) ( 8 ) ( 0 ) ( 5 ) ( 9 ) ( 1 ) ( 2 ) ( 6 ) ( 10 ) ( 3 ) ( 7 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) ( 18 ) ( 19 ) ( 20 ) ( 21 ) ( 22 ) ( 23 ) ( 24 ) ( 25 ) ( 26 ) ( 27 ) ( 28 ) ( 29 ) ( 30 ) ( 31 ) ( 32 ) ( 33 ) ( 34 ) ( 35 ) ( 36 ) ( 37 ) ( 38 ) ( 39 ) ( 40 ) ( 41 ) ( 42 ) ( 43 ) ( 44 ) ( 45 ) ( 46 ) ( 47 ) ( 48 ) ( 49 )
--------------------------------------------------------------------------------
Level:          2
Size:           8 MB
Cache groups:       ( 4 8 0 5 9 1 2 6 10 3 7 11 ) ( 12 13 14 15 16 17 18 19 20 21 22 23 ) ( 24 25 26 27 28 29 30 31 32 33 34 35 ) ( 36 37 38 39 40 41 42 43 44 45 46 47 ) ( 48 49 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:       8
--------------------------------------------------------------------------------
Domain:         0
Processors:     ( )
Distances:      10 20 30 30 40 50 60 60
Free memory:        22.875 MB
Total memory:       715.625 MB
--------------------------------------------------------------------------------
Domain:         1
Processors:     ( )
Distances:      20 10 30 30 50 40 60 60
Free memory:        206.188 MB
Total memory:       765.062 MB
--------------------------------------------------------------------------------
Domain:         2
Processors:     ( )
Distances:      30 30 10 20 60 60 40 50
Free memory:        743.75 MB
Total memory:       765.062 MB
--------------------------------------------------------------------------------
Domain:         3
Processors:     ( )
Distances:      30 30 20 10 60 60 50 40
Free memory:        733.438 MB
Total memory:       759.375 MB
--------------------------------------------------------------------------------
Domain:         4
Processors:     ( 12 13 14 15 16 17 18 19 20 21 22 23 )
Distances:      40 50 60 60 10 20 30 30
Free memory:        7117.75 MB
Total memory:       7345.19 MB
--------------------------------------------------------------------------------
Domain:         5
Processors:     ( 24 25 26 27 28 29 30 31 32 33 34 35 )
Distances:      50 40 60 60 20 10 30 30
Free memory:        7221.75 MB
Total memory:       7409.44 MB
--------------------------------------------------------------------------------
Domain:         6
Processors:     ( 36 37 38 39 40 41 42 43 44 45 46 47 )
Distances:      60 60 40 50 30 30 10 20
Free memory:        7242.62 MB
Total memory:       7409.44 MB
--------------------------------------------------------------------------------
Domain:         7
Processors:     ( 48 49 )
Distances:      60 60 50 40 30 30 20 10
Free memory:        7013.31 MB
Total memory:       7403.38 MB
--------------------------------------------------------------------------------

Additional context

processor : 1 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 12 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 13 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 14 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 15 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 16 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 17 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 18 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 19 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 20 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 21 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 22 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 23 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 24 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 25 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 26 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 27 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 28 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 29 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 30 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 31 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 32 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 33 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 34 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 35 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 36 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 37 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 38 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 39 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 40 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 41 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 42 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 43 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 44 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 45 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 46 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 47 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 48 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 49 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 50 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 51 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 52 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 53 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 54 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 55 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 56 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 57 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 58 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

processor : 59 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0

TomTheBear commented 10 months ago

Thanks for the reporting the issue. Unfortunately, I don't have access to FX1000 nodes anymore to test it, so I will follow your proposed fix to set numHWthreads = max_id + 1 in parse_cpuinfo. This function is only used on ARM chips.

Can you run hwloc_gather_topology on one of the FX1000 nodes and attach the tarball here? Then I can test the topology stuff remotely in the future.

TomTheBear commented 10 months ago

Are the "management cores" marked online in /sys/devices/system/cpu/online?

Can you please test the linked PR whether it fixes it for you?

vatai commented 10 months ago

Ironically @ficstamas also lost access to A64FX machines :sweat_smile: so I'll try to help out:

-> % cat /sys/devices/system/cpu/online 
0-1,12-59

I'm not 100% up to speed yet, but will slowly figure things out. @ficstamas already started explaining things to me.

TomTheBear commented 10 months ago

Many thanks for stepping in and providing the output. Thanks @ficstamas for your efforts.

Then the proposed fix in PR #568 should do it.

Testing:

$ git clone -b fix_a64fx_fx1000_detection git@github.com:RRZE-HPC/likwid.git likwid-fixed
$ cd likwid-fixed
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install install
$ export PATH=/tmp/likwid-install/bin:$PATH
$ export LD_LIBRARY_PATH=/tmp/likwid-install/lib:$LD_LIBRARY_PATH
$ likwid-topology  # The last 4 NUMA domains should contain 12 HW threads each
ficstamas commented 10 months ago

Many thanks for stepping in and providing the output. Thanks @ficstamas for your efforts.

Then the proposed fix in PR #568 should do it.

Testing:

$ git clone -b fix_a64fx_fx1000_detection git@github.com:RRZE-HPC/likwid.git likwid-fixed
$ cd likwid-fixed
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install install
$ export PATH=/tmp/likwid-install/bin:$PATH
$ export LD_LIBRARY_PATH=/tmp/likwid-install/lib:$LD_LIBRARY_PATH
$ likwid-topology  # The last 4 NUMA domains should contain 12 HW threads each

Oh, maybe one more difference is that I compiled the project with the Fujitsu Compiler. @vatai try the above example first, if that does not work use FCC.

TomTheBear commented 10 months ago

It shouldn't make a difference whether GCC or FCC is used. Did it work for you with COMPILER=FCC or were adjustments required?

ficstamas commented 10 months ago

It worked without issues but I never trust FCC 😄 lets just wait for @vatai

TomTheBear commented 10 months ago

I want to merge the PR. @vatai it would be good if you could test soon

vatai commented 10 months ago

TL;DR: Merge!

Terribly sorry for the slow reply. Didn't get around github to see your tag! :(

With @ficstamas 's help, we ran the fix_a64fx branch and it looks good (see attachments).

I'm including the hwloc output as well.

AAAaaand you can't upload tgz to github so I'm sharing the files via box.com instead of attachmet (let me know if I messed up the sharing): https://riken-share.box.com/s/16fmvlpnkjwpqjsi3cc07cmc5j27ao5p

TomTheBear commented 10 months ago

Perfekt, thank you both for finding, fixing and testing.

I downloaded the hwloc tarball from the box.com page for future testing. I know, Github does not like archives as attachments.