RRZE-HPC / kerncraft

Loop Kernel Analysis and Performance Modeling Toolkit
GNU Affero General Public License v3.0
88 stars 24 forks source link

Error detecting Cores per NUMA domain in likwid_bench_auto #67

Closed sguera closed 6 years ago

sguera commented 6 years ago

In case of a Broadwell architecture, it retrieves:

cores per NUMA domain: 0.1

In case of a Bulldozer architecture, it retrieves

cores per NUMA domain: 0.125
cod3monk commented 6 years ago

Can you provide the output of likwid-topology on both architectures?

sguera commented 6 years ago

Here it is for the Broadwell:

[guerrera@dmi-cl-login ~]$ likwid-topology 
--------------------------------------------------------------------------------
CPU name:   Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
CPU type:   Intel Xeon Broadwell EN/EP/EX processor
CPU stepping:   1
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:        2
Cores per socket:   10
Threads per core:   1
--------------------------------------------------------------------------------
HWThread    Thread      Core        Socket      Available
0       0       0       0       *
1       0       1       0       *
2       0       2       0       *
3       0       3       0       *
4       0       4       0       *
5       0       5       0       *
6       0       6       0       *
7       0       7       0       *
8       0       8       0       *
9       0       9       0       *
10      0       10      1       *
11      0       11      1       *
12      0       12      1       *
13      0       13      1       *
14      0       14      1       *
15      0       15      1       *
16      0       16      1       *
17      0       17      1       *
18      0       18      1       *
19      0       19      1       *
--------------------------------------------------------------------------------
Socket 0:       ( 0 1 2 3 4 5 6 7 8 9 )
Socket 1:       ( 10 11 12 13 14 15 16 17 18 19 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:          1
Size:           32 kB
Cache groups:       ( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) ( 18 ) ( 19 )
--------------------------------------------------------------------------------
Level:          2
Size:           256 kB
Cache groups:       ( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) ( 18 ) ( 19 )
--------------------------------------------------------------------------------
Level:          3
Size:           25 MB
Cache groups:       ( 0 1 2 3 4 5 6 7 8 9 ) ( 10 11 12 13 14 15 16 17 18 19 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:       2
--------------------------------------------------------------------------------
Domain:         0
Processors:     ( 0 1 2 3 4 5 6 7 8 9 )
Distances:      10 21
Free memory:        3702.58 MB
Total memory:       32671.7 MB
--------------------------------------------------------------------------------
Domain:         1
Processors:     ( 10 11 12 13 14 15 16 17 18 19 )
Distances:      21 10
Free memory:        2110.73 MB
Total memory:       32768 MB
--------------------------------------------------------------------------------

and for the Bulldozer:

--------------------------------------------------------------------------------
CPU name:   AMD Opteron(TM) Processor 6274
CPU type:   AMD Interlagos processor
CPU stepping:   2
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:        2
Cores per socket:   16
Threads per core:   1
--------------------------------------------------------------------------------
HWThread    Thread      Core        Socket      Available
0       0       0       0       *
1       0       1       0       *
2       0       2       0       *
3       0       3       0       *
4       0       4       0       *
5       0       5       0       *
6       0       6       0       *
7       0       7       0       *
8       0       8       0       *
9       0       9       0       *
10      0       10      0       *
11      0       11      0       *
12      0       12      0       *
13      0       13      0       *
14      0       14      0       *
15      0       15      0       *
16      0       16      1       *
17      0       17      1       *
18      0       18      1       *
19      0       19      1       *
20      0       20      1       *
21      0       21      1       *
22      0       22      1       *
23      0       23      1       *
24      0       24      1       *
25      0       25      1       *
26      0       26      1       *
27      0       27      1       *
28      0       28      1       *
29      0       29      1       *
30      0       30      1       *
31      0       31      1       *
--------------------------------------------------------------------------------
Socket 0:       ( 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 )
Socket 1:       ( 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:          1
Size:           16 kB
Cache groups:       ( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) ( 18 ) ( 19 ) ( 20 ) ( 21 ) ( 22 ) ( 23 ) ( 24 ) ( 25 ) ( 26 ) ( 27 ) ( 28 ) ( 29 ) ( 30 ) ( 31 )
--------------------------------------------------------------------------------
Level:          2
Size:           2 MB
Cache groups:       ( 0 1 ) ( 2 3 ) ( 4 5 ) ( 6 7 ) ( 8 9 ) ( 10 11 ) ( 12 13 ) ( 14 15 ) ( 16 17 ) ( 18 19 ) ( 20 21 ) ( 22 23 ) ( 24 25 ) ( 26 27 ) ( 28 29 ) ( 30 31 )
--------------------------------------------------------------------------------
Level:          3
Size:           6 MB
Cache groups:       ( 0 1 2 3 4 5 6 7 ) ( 8 9 10 11 12 13 14 15 ) ( 16 17 18 19 20 21 22 23 ) ( 24 25 26 27 28 29 30 31 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:       4
--------------------------------------------------------------------------------
Domain:         0
Processors:     ( 0 1 2 3 4 5 6 7 )
Distances:      10 16 16 16
Free memory:        63414.8 MB
Total memory:       64431.7 MB
--------------------------------------------------------------------------------
Domain:         1
Processors:     ( 8 9 10 11 12 13 14 15 )
Distances:      16 10 16 16
Free memory:        62528.9 MB
Total memory:       64511 MB
--------------------------------------------------------------------------------
Domain:         2
Processors:     ( 16 17 18 19 20 21 22 23 )
Distances:      16 16 10 16
Free memory:        63737 MB
Total memory:       64511 MB
--------------------------------------------------------------------------------
Domain:         3
Processors:     ( 24 25 26 27 28 29 30 31 )
Distances:      16 16 16 10
Free memory:        61490.1 MB
Total memory:       64495 MB
--------------------------------------------------------------------------------