apc-llc / likwid

Lightweight performance tools
https://code.google.com/p/likwid/
GNU General Public License v3.0
1 stars 0 forks source link

likwid-perfctr -C0 segfaults on startup if running in a cpuset #164

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm having occasional problems with a segfault on startup with certain command 
lines.  I think this might have to do with the fact that the Haswell i7-4770 
I'm running on has hyperthreading disabled, so that there is a confusion about 
whether it has 4 or 8 cores available.   Whether you actually get a crash 
depends on many small details, so this may be difficult to reproduce.  But I 
think I can show where the problem is well enough that you might be able to 
debug it blind.  

What steps will reproduce the problem?

You probably want to recompile with '-g' to have line numbers available for 
debugging.  It's possible (probable?) that this should be the default for all 
builds, since there are few downsides.

$ diff -u make/include_GCC.mk~ make/include_GCC.mk
--- make/include_GCC.mk~    2014-05-20 08:54:52.000000000 -0400
+++ make/include_GCC.mk 2014-08-31 15:07:57.731995592 -0400
@@ -12,7 +12,7 @@
 #ANSI_CFLAGS += -Wextra
 #ANSI_CFLAGS += -fWall

-CFLAGS   =  -O2  -Wno-format -std=c99
+CFLAGS   =  -O2  -Wno-format -std=c99 -g
 FCFLAGS  = -module ./  # ifort
 #FCFLAGS  = -J ./  -fsyntax-only  #gfortran
 PASFLAGS  = x86-64

Then use valgrind to look for memory errors:

nate@haswell:~/likwid/likwid-3.1.2$ valgrind likwid-perfctr -C0 ls
==6548== Memcheck, a memory error detector
==6548== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==6548== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==6548== Command: likwid-perfctr -C0 ls
==6548==
==6548== Invalid write of size 4
==6548==    at 0x403A89: treeFillNextEntries (affinity.c:132)
==6548==    by 0x403E49: affinity_init (affinity.c:247)
==6548==    by 0x401B66: main (likwid-perfctr.c:135)
==6548==  Address 0x5732f70 is 0 bytes after a block of size 16 alloc'd
==6548==    at 0x4C2A2DB: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6548==    by 0x403E05: affinity_init (affinity.c:244)
==6548==    by 0x401B66: main (likwid-perfctr.c:135)
==6548==
ERROR - [./src/strUtil.c:437] You are running inside a cpuset. In cpusets only 
logical pinning inside set is allowed!

What is the expected output? What do you see instead?

Somehow the counter at treeFillNextEntries():131 is going negative, which 
writes beyond the allocated area.   Whether this causes a segfault or not 
depends on exact memory layout and command line.  For me, I get a crash if I 
use '-C0', but not if I use '-CL:0', although the overrun occurs in both cases.

What version of the product are you using?

3.1.2

Please provide any additional information below.

Changing to check if "counter > 0" seems to solve the problem, but may be 
masking an underlying issue.   Possibly it should be an assert() instead, and 
the logic corrected so this does not happen.

$ diff -u src/affinity.c~ src/affinity.c
--- src/affinity.c~ 2014-08-31 14:15:20.643916086 -0400
+++ src/affinity.c  2014-08-31 15:20:49.308015023 -0400
@@ -126,9 +126,8 @@

     thread = tree_getChildNode(node);

-    while ( thread != NULL )
+    while ( thread != NULL && counter > 0 )
     {
-        fprintf(stderr, "counter: %d\n", counter);
       processorIds[numberOfEntries-counter] = thread->id;
       thread = tree_getNextNode(thread);
       counter--;

In particular, I don't know if I'm actually running inside a cpuset or not, or 
if this is an artifact of turning off hyperthreading in the BIOS.  Or maybe 
that's how hyperthreading is turned off?  Someone else administrates this 
machine, I'm I don't know enough cpusets to know how to things are set up.  
Nothing is mounted as '/dev/cpuset', but perhaps there are alternative ways of 
configuring.   I'm not knowledgeable about the way this works internally.

Original issue reported on code.google.com by n...@verse.com on 31 Aug 2014 at 7:24

GoogleCodeExporter commented 9 years ago
Hi Nate,

I need to get deeper insight in your problem. Your assumption with disabled 
Hyperthreading may be right. The 3.1 versions perform the topology lookup using 
the CPUID instruction. I have to check whether there is any "hyperthread lookup 
code". The affinity module does not retrieve any values from the system, it 
uses the values gathered by the cpuid and numa module.

Whether you are inside a cpuset can be checked in the procfs (file 
/proc/<pid>/status) with the line CPUs_allowed_list. But this error message may 
be a result of faulty hyperthreading-lookup code that sees all HW threads but 
you limited (similar to a cpuset) to use half of them.

Original comment by Thomas.R...@googlemail.com on 1 Sep 2014 at 12:08

GoogleCodeExporter commented 9 years ago
Hi Nate,

Ich checked your issue and cannot reproduce it. My office desktop is also a 
Haswell i7-4770 and I disabled HyperThreading in the BIOS.

I attached a patch that prints out the topology values of your machine, I 
assume the operation sysconf(_SC_NPROCESSORS_CONF) returns the wrong number of 
hardware threads. That causes the Can you please send me the output of your 
machine after applying the patch.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 5 Sep 2014 at 9:59

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for looking into this.   Here's what I see with the patch applied to a 
clean download:

nate@haswell:~/likwid/likwid-3.1.2$ ./likwid-perfctr -C1 ls
Found achritectural data:
numHWThreads 8
numSockets 1
numCoresPerSocket 4
numThreadsPerCore 1
numCacheLevels 4

Values determined to create affinity groups:
numberOfSocketDomains 1
numberOfNumaDomains 1
numberOfProcessorsPerSocket 4
numberOfCoresPerCache 4
numberOfProcessorsPerCache 4
numberOfCacheDomains 1
numberOfDomains 4
Segmentation fault (core dumped)

Here's some other diagnostics that might offer clues:

nate@haswell:~/likwid/likwid-3.1.2$ cat /proc/self/status | grep -i CPU
Cpus_allowed:   ff
Cpus_allowed_list:  0-7

nate@haswell:~/turbostat$ sudo turbostat  -v
turbostat v3.4 April 17, 2013 - Len Brown <lenb@kernel.org>
CPUID(0): GenuineIntel 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
CPUID(6): APERF, DTS, PTM, EPB
RAPL: 3121 sec. Joule Counter Range
cpu0: MSR_NHM_PLATFORM_INFO: 0x80838f3012200
8 * 100 = 800 MHz max efficiency
34 * 100 = 3400 MHz TSC frequency
cpu0: MSR_IA32_POWER_CTL: 0x0004005d (C1E: DISabled)
cpu0: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008405 (UNdemote-C3, UNdemote-C1, 
demote-C3, demote-C1, locked: pkg-cstate-limit=5: pc7s)
cpu0: MSR_NHM_TURBO_RATIO_LIMIT: 0x25262727
37 * 100 = 3700 MHz max turbo 4 active cores
38 * 100 = 3800 MHz max turbo 3 active cores
39 * 100 = 3900 MHz max turbo 2 active cores
39 * 100 = 3900 MHz max turbo 1 active cores
cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 
0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x000002a0 (84 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x80428348001a82a0 (locked)
cpu0: PKG Limit #1: ENabled (84.000000 Watts, 8.000000 sec, clamp DISabled)
cpu0: PKG Limit #2: ENabled (105.000000 Watts, 0.002441* sec, clamp DISabled)
cpu0: MSR_PP0_POLICY: 0
cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP1_POLICY: 0
cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88480800 (28 C)
cpu0: MSR_IA32_THERM_STATUS: 0x88480800 (28 C +/- 1)
cpu1: MSR_IA32_THERM_STATUS: 0x88490800 (27 C +/- 1)
cpu2: MSR_IA32_THERM_STATUS: 0x88490800 (27 C +/- 1)
cpu3: MSR_IA32_THERM_STATUS: 0x88480800 (28 C +/- 1)
cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   
%pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
          0.08 3.39 3.39   0   0.08   0.05   0.02  99.77   29   29  99.24   0.00   0.00   0.00   3.48   0.02  0.00
  0   0   0.02 3.38 3.39   0   0.02   0.07   0.00  99.89   29   29  99.24   0.00   0.00   0.00   3.48   0.02  0.00
  1   1   0.04 3.39 3.39   0   0.05   0.02   0.00  99.89   28
  2   2   0.04 3.39 3.39   0   0.08   0.08   0.02  99.79   27
  3   3   0.21 3.39 3.39   0   0.19   0.02   0.05  99.53   26
...

nate@haswell:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping    : 3
microcode   : 0x16
cpu MHz     : 3401.000
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm
bogomips    : 6784.91
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
...

Original comment by n...@verse.com on 7 Sep 2014 at 4:31

GoogleCodeExporter commented 9 years ago
Hi Nate,

the problem can be seen in the second line of the patch output:
numHWThreads 8
The sysconf function does not return the right number of active HW threads.
Based on your diagnostics the problem seems deeper since the process status 
output also returns a list with 8 threads. Turbostat seems to analyze the 
architecture correctly. How many processors are listed in cpuinfo?
cat /proc/cpuinfo  | grep 'processor' | sort -u | wc -l
If cpuinfo prints 4 you can use the attached patch that I already posted 
somewhere on the mailing list. It executes the above command and uses this 
value if it is lower than the one returned by sysconf.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 8 Sep 2014 at 12:21

Attachments:

GoogleCodeExporter commented 9 years ago
/proc/cpuinfo has the correct info:

nate@haswell:~$ cat /proc/cpuinfo  | grep 'processor'
processor   : 0
processor   : 1
processor   : 2
processor   : 3

The attached patch works for me, and likwid-perfctr no longer segfaults at 
startup.  

Source for turbostat is here in case they have a more elegant technique: 
https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbosta
t.c

Thanks for all the quick fixing!

--nate

Original comment by n...@verse.com on 8 Sep 2014 at 3:03

GoogleCodeExporter commented 9 years ago
Hi Nate,

I will take a look into the turbostat application, thanks for the hint. LIKWID 
4 uses the hwloc library to get the topology information. I will take a look if 
we still use sysconf there but I don't think so. 

I close this issue.

Greetings,
Thomas

P.S. The 3.1 branch in the SVN repo already has the patch included

Original comment by Thomas.R...@googlemail.com on 9 Sep 2014 at 3:29