Closed GoogleCodeExporter closed 8 years ago
Hi Nate,
I need to get deeper insight in your problem. Your assumption with disabled
Hyperthreading may be right. The 3.1 versions perform the topology lookup using
the CPUID instruction. I have to check whether there is any "hyperthread lookup
code". The affinity module does not retrieve any values from the system, it
uses the values gathered by the cpuid and numa module.
Whether you are inside a cpuset can be checked in the procfs (file
/proc/<pid>/status) with the line CPUs_allowed_list. But this error message may
be a result of faulty hyperthreading-lookup code that sees all HW threads but
you limited (similar to a cpuset) to use half of them.
Original comment by Thomas.R...@googlemail.com
on 1 Sep 2014 at 12:08
Hi Nate,
Ich checked your issue and cannot reproduce it. My office desktop is also a
Haswell i7-4770 and I disabled HyperThreading in the BIOS.
I attached a patch that prints out the topology values of your machine, I
assume the operation sysconf(_SC_NPROCESSORS_CONF) returns the wrong number of
hardware threads. That causes the Can you please send me the output of your
machine after applying the patch.
Greetings,
Thomas
Original comment by Thomas.R...@googlemail.com
on 5 Sep 2014 at 9:59
Attachments:
Thanks for looking into this. Here's what I see with the patch applied to a
clean download:
nate@haswell:~/likwid/likwid-3.1.2$ ./likwid-perfctr -C1 ls
Found achritectural data:
numHWThreads 8
numSockets 1
numCoresPerSocket 4
numThreadsPerCore 1
numCacheLevels 4
Values determined to create affinity groups:
numberOfSocketDomains 1
numberOfNumaDomains 1
numberOfProcessorsPerSocket 4
numberOfCoresPerCache 4
numberOfProcessorsPerCache 4
numberOfCacheDomains 1
numberOfDomains 4
Segmentation fault (core dumped)
Here's some other diagnostics that might offer clues:
nate@haswell:~/likwid/likwid-3.1.2$ cat /proc/self/status | grep -i CPU
Cpus_allowed: ff
Cpus_allowed_list: 0-7
nate@haswell:~/turbostat$ sudo turbostat -v
turbostat v3.4 April 17, 2013 - Len Brown <lenb@kernel.org>
CPUID(0): GenuineIntel 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
CPUID(6): APERF, DTS, PTM, EPB
RAPL: 3121 sec. Joule Counter Range
cpu0: MSR_NHM_PLATFORM_INFO: 0x80838f3012200
8 * 100 = 800 MHz max efficiency
34 * 100 = 3400 MHz TSC frequency
cpu0: MSR_IA32_POWER_CTL: 0x0004005d (C1E: DISabled)
cpu0: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008405 (UNdemote-C3, UNdemote-C1,
demote-C3, demote-C1, locked: pkg-cstate-limit=5: pc7s)
cpu0: MSR_NHM_TURBO_RATIO_LIMIT: 0x25262727
37 * 100 = 3700 MHz max turbo 4 active cores
38 * 100 = 3800 MHz max turbo 3 active cores
39 * 100 = 3900 MHz max turbo 2 active cores
39 * 100 = 3900 MHz max turbo 1 active cores
cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules,
0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x000002a0 (84 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x80428348001a82a0 (locked)
cpu0: PKG Limit #1: ENabled (84.000000 Watts, 8.000000 sec, clamp DISabled)
cpu0: PKG Limit #2: ENabled (105.000000 Watts, 0.002441* sec, clamp DISabled)
cpu0: MSR_PP0_POLICY: 0
cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP1_POLICY: 0
cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88480800 (28 C)
cpu0: MSR_IA32_THERM_STATUS: 0x88480800 (28 C +/- 1)
cpu1: MSR_IA32_THERM_STATUS: 0x88490800 (27 C +/- 1)
cpu2: MSR_IA32_THERM_STATUS: 0x88490800 (27 C +/- 1)
cpu3: MSR_IA32_THERM_STATUS: 0x88480800 (28 C +/- 1)
cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2
%pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
0.08 3.39 3.39 0 0.08 0.05 0.02 99.77 29 29 99.24 0.00 0.00 0.00 3.48 0.02 0.00
0 0 0.02 3.38 3.39 0 0.02 0.07 0.00 99.89 29 29 99.24 0.00 0.00 0.00 3.48 0.02 0.00
1 1 0.04 3.39 3.39 0 0.05 0.02 0.00 99.89 28
2 2 0.04 3.39 3.39 0 0.08 0.08 0.02 99.79 27
3 3 0.21 3.39 3.39 0 0.19 0.02 0.05 99.53 26
...
nate@haswell:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x16
cpu MHz : 3401.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2 erms invpcid rtm
bogomips : 6784.91
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
...
Original comment by n...@verse.com
on 7 Sep 2014 at 4:31
Hi Nate,
the problem can be seen in the second line of the patch output:
numHWThreads 8
The sysconf function does not return the right number of active HW threads.
Based on your diagnostics the problem seems deeper since the process status
output also returns a list with 8 threads. Turbostat seems to analyze the
architecture correctly. How many processors are listed in cpuinfo?
cat /proc/cpuinfo | grep 'processor' | sort -u | wc -l
If cpuinfo prints 4 you can use the attached patch that I already posted
somewhere on the mailing list. It executes the above command and uses this
value if it is lower than the one returned by sysconf.
Greetings,
Thomas
Original comment by Thomas.R...@googlemail.com
on 8 Sep 2014 at 12:21
Attachments:
/proc/cpuinfo has the correct info:
nate@haswell:~$ cat /proc/cpuinfo | grep 'processor'
processor : 0
processor : 1
processor : 2
processor : 3
The attached patch works for me, and likwid-perfctr no longer segfaults at
startup.
Source for turbostat is here in case they have a more elegant technique:
https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbosta
t.c
Thanks for all the quick fixing!
--nate
Original comment by n...@verse.com
on 8 Sep 2014 at 3:03
Hi Nate,
I will take a look into the turbostat application, thanks for the hint. LIKWID
4 uses the hwloc library to get the topology information. I will take a look if
we still use sysconf there but I don't think so.
I close this issue.
Greetings,
Thomas
P.S. The 3.1 branch in the SVN repo already has the patch included
Original comment by Thomas.R...@googlemail.com
on 9 Sep 2014 at 3:29
Original issue reported on code.google.com by
n...@verse.com
on 31 Aug 2014 at 7:24