Gwinel / likwid

Automatically exported from code.google.com/p/likwid
GNU General Public License v3.0
0 stars 0 forks source link

Unsupported Processor #166

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,
I am trying to use likwid on a haswell computer:

cat /proc/cpuinfo | pg
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
stepping        : 2
cpu MHz         : 2600.000
cache size      : 30720 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx p
dpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology 
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse
3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt 
pln pts
 dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips        : 5187.83
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
etc......

I've got this output:

sudo /opt/benchmarks/bin/likwid-perfctr -a
ERROR - [./src/perfmon.c:1117] Unsupported Processor

I am using this version:
sudo /opt/benchmarks/bin/likwid-perfctr 
likwid-perfCtr --  Version  2.2 

It was downloaded from the latest stable release 3.1.2 

-rw-r--r-- 1 user group 485311 Sep 18 16:44 likwid-stable.tar.gz

Is your product not yet available on haswell ? When do you expect to release an 
haswell version ?

Thanks.
Best regards,
-D.

Daniel Charpin

Original issue reported on code.google.com by d.char...@free.fr on 19 Sep 2014 at 8:14

GoogleCodeExporter commented 9 years ago
Hi Daniel,

LIKWID is available for Intel Haswell but you have to use a newer version of 
LIKWID. Your likwid-perfctr tool reports version 2.2, a pretty old version. 
When you use the 3.1.2 version, the reported version should be 3.1.

Please download the likwid-3.1.2.tar.gz to be sure that you got the newest 
version. I checked the FTP folder and also likwid-stable should be fine:
likwid-3.1.2.tar.gz     02-Jun-2014 15:58  474K  
likwid-stable.tar.gz    02-Jun-2014 15:58  474K

I hope this fixes your problem.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 30 Sep 2014 at 2:00

GoogleCodeExporter commented 9 years ago
Actually, Daniel asks for support for the new Xeon Haswell EP, not a regular 
Haswell.  We have a similar system (dual Xeon E5-2697v3), and likwid 3.1.2 does 
not work with this processor (some tools complain about an unsupported CPU, 
others simply crash).  I hope that likwid will support the Haswell EP anytime 
soon!

Thanks,  John

Original comment by j.w.rom...@gmail.com on 30 Sep 2014 at 2:20

GoogleCodeExporter commented 9 years ago
I'm currently working on the Haswell EX support. We do not have such a system 
but now we have access to one.

You can try the v3.1 branch in the SVN repository, most of the Haswell EX 
support is implemented there already. I would be glad to get some feedback 
because all systems differ and could cause problems that we have not seen by 
now on our Haswell EX.

Original comment by Thomas.R...@googlemail.com on 1 Oct 2014 at 8:55

GoogleCodeExporter commented 9 years ago
Dear Thomas,

I tried to read the power performance counters using the v3.1 branch version 
(see attachment), but it reports:

Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0xce

The program does work on an Ivy Bridge EP.

Thanks,  John

Original comment by j.w.rom...@gmail.com on 1 Oct 2014 at 8:23

Attachments:

GoogleCodeExporter commented 9 years ago
Hi John,

First, it is nice to see that you try to use LIKWID as a library. That is also 
the idea of the current trunk in the SVN. 

Your error message is very strange because the register 0xCE is not usable and 
not configured for Haswell (EX). In fact, the register is completely unknown to 
LIKWID. The old Uncore of Westmere/Nehalem EX used the registers 0xCE0 - 0xCEB 
for accessing the memory controllers.

I tested your program on my Haswell (i7-4770) and Haswell EX (E5-2697 v3) 
systems and it works like a charm.

Can you please send me the output of /proc/cpuinfo of one of your cores. Maybe 
your system uses another CPU ID that is currently not known to LIKWID.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 2 Oct 2014 at 8:23

GoogleCodeExporter commented 9 years ago
processor       : 55
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
stepping        : 2
microcode       : 0x1d
cpu MHz         : 1200.000
cache size      : 17920 KB
physical id     : 1
siblings        : 20
core id         : 14
cpu cores       : 14
apicid          : 61
initial apicid  : 61
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt 
pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 
hle avx2 smep bmi2 erms invpcid rtm
bogomips        : 5189.70
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Original comment by j.w.rom...@gmail.com on 2 Oct 2014 at 8:28

GoogleCodeExporter commented 9 years ago
Thanks for providing the information. The output is rather identical, the only 
difference is the microcode.
microcode   : 35

Have you tried other tools to read the RAPL counters like PAPI or
http://web.eece.maine.edu/~vweaver/projects/rapl/rapl-read.c

Some mainboard manufactorers disable the RAPL support and some other features. 
But nevertheless, LIKWID should recognize your CPU correctly.

I just added some Haswell-related fixes to the v3.1 branch in the SVN. Attached 
is your code but filled with some print statements to see where LIKWID tries to 
access the register 0xCE. Moreover, the accessClient does not exit anymore if 
you try to write to a unusable register.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 2 Oct 2014 at 9:11

Attachments:

GoogleCodeExporter commented 9 years ago
Now it returns:

Family: 6
Model: 63
NUMA nodes: 2
Socket to daemon: 4
Current CPU clock: 2593991590
Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0xce
Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0x1ad
Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0x606
Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0x614
RAPL energy unit: 1
Failed to read data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 0 reg 0x611
0

Thanks, John

Original comment by j.w.rom...@gmail.com on 2 Oct 2014 at 9:29

GoogleCodeExporter commented 9 years ago
At first, sorry, I gave you a wrong information, the register 0xCE is known. 
The register is called MSR_PLATFORM_INFO and contains the minimal and base 
frequency of the CPUs.
Register 0x1AD contains Turbo mode information. The 0x6XX registers are RAPL 
registers and it seems that your system has no RAPL support at all. The 
register 0x606 is the base RAPL register that contains the RAPL energy unit.

Have you checked your BIOS? Have you tried the other tools?
If nothing helps, you have to ask your mainboard vendor if they disabled it and 
maybe how to activate it.

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 2 Oct 2014 at 10:17

GoogleCodeExporter commented 9 years ago
Another thing you can try is to use direct access to the MSR devices. You have 
to be root and replace the call of 
accessClient_init(socket);
with
accessClient_setaccessmode(0);
The socket that is given to msr_init is ignored in this case.

If this works, the problem is the access daemon but I don't think it is. The 
error message would be "access to this register is not allowed".

Original comment by Thomas.R...@googlemail.com on 2 Oct 2014 at 10:45

GoogleCodeExporter commented 9 years ago
After changing some BIOS settings, I can now read the power monitoring 
counters, even as normal user.  Maybe it was BIOS related; I encountered some 
more problems.  I expect to receive a new BIOS soon anyway.

Thank you very much!  likwid is a useful tool.

John

Original comment by j.w.rom...@gmail.com on 2 Oct 2014 at 2:59

GoogleCodeExporter commented 9 years ago
I'm glad that you can use the RAPL counters now. Have Fun.

I don't know whether Daniel still encounters problems, therefore I will not 
close this issue by now.

@Daniel: Does the new version fixed your problem? Is your CPU now recognized by 
LIKWID?

Original comment by Thomas.R...@googlemail.com on 2 Oct 2014 at 3:11

GoogleCodeExporter commented 9 years ago
In Comment #12 on issue 166, you wrote:

 @Daniel: Does the new version fixed your problem? Is your CPU now  
 recognized by LIKWID?

My colleague Johann rebuilt the package and he got this error message:

[root@manny401 seq]# ~peyrardj/local/likwid-3.1.2/bin/likwid-perfctr -C S1:0  
-g FLOPS_DP ./m.exe 
Failed to write data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 12 reg 0x700
The current system does not support Uncore MSRs, deactivating Uncore support
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Core Haswell processor 
CPU clock:      2.59 GHz 
ERROR - [./src/perfmon.c:1094] Unsupported group or event for this architecture!

Some more information about the installation:

$ svn info
Path: .
URL: http://likwid.googlecode.com/svn
Repository Root: http://likwid.googlecode.com/svn
Repository UUID: 38d4caea-f284-64ea-201f-c04d7a3ea24d
Revision: 251
Node Kind: directory
Schedule: normal
Last Changed Author: Thomas.Roehl@googlemail.com
Last Changed Rev: 251
Last Changed Date: 2014-10-06 13:40:18 +0200 (Mon, 06 Oct 2014)

$ cd likwid-read-only/branches/v3.1/
$ make
===>  GENERATE GROUP HEADERS
===>  GENERATE HEADER GCC/perfmon_atom_events.h
===>  GENERATE HEADER GCC/perfmon_core2_events.h
===>  GENERATE HEADER GCC/perfmon_haswell_events.h
...
...

===>  ASSEMBLE  GCC/triad_mem.o
===>  ASSEMBLE  GCC/update.o
===>  CREATE STATIC LIB  liblikwid.a
===>  LINKING  likwid-perfctr
===>  LINKING  likwid-features
===>  LINKING  likwid-powermeter
===>  LINKING  likwid-memsweeper
===>  LINKING  likwid-topology
===>  LINKING  likwid-genCfg
===>  LINKING  likwid-pin
===>  LINKING  likwid-bench

# likwid-perfctr
likwid-perfctr
likwid-perfctr --  Version  3.1

Example Usage: likwid-perfctr -C 2  ./a.out
Supported Options:
-h       Help message
-v       Version information
-V       verbose output
-g       performance group or event set string
-H       Get group help (together with -g switch)
-t       timeline mode with frequency in s or ms, e.g. 300ms
-S       stethoscope mode with duration in s
-m       use markers inside code
-s       bitmask with threads to skip
-o       Store output to file, with output conversation according to file suffix
         Conversation scripts can be supplied in /home_nfs/peyrardj/build/likwid-3.1.2/share/likwid
-O       Output easily parseable CSV instead of fancy tables
-M       set how MSR registers are accessed: 0=direct, 1=msrd
-a       list available performance groups
-e       list available counters and events
-i       print cpu info
-c       processor ids to measure (required), e.g. 1,2-4,8
-C       processor ids to measure (this variant also cares for pinning of 
process/threads), e.g. 1,2-4,8

# id
uid=0(root) gid=0(root) groupes=0(root)

# ~peyrardj/local/likwid-3.1.2/bin/likwid-perfctr -C S1:0  -g FLOPS_DP ./m.exe 
Failed to write data through daemon: daemon returned error 4 'failed to 
read/write register' for cpu 12 reg 0x700
The current system does not support Uncore MSRs, deactivating Uncore support
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Core Haswell processor 
CPU clock:      2.59 GHz 
ERROR - [./src/perfmon.c:1094] Unsupported group or event for this architecture!

So, have you any idea of what is going wrong ?
Thanks for your help,
Best regards,
-D.

Original comment by d.char...@free.fr on 6 Oct 2014 at 3:00

GoogleCodeExporter commented 9 years ago
Hi,

that seems to work correctly. Your system is recognizes as Intel Haswell. The 
Version 3.1 does not differentiate between Haswell and Haswell EP.
The first "Failed to write data" message is not a real failure. It tests 
whether the register 0x700 is writable, the first Uncore register for the 
default Haswell. I have not seen on default Haswell that supports the Uncore 
but it is documentated, so I implemented it. 

Your problem simply is the fact that there is no FLOPS_DP performance group for 
Haswell. We would like to offer such a group, but Intel removed the 
floating-point operation events, probably because they are faulty on the Intel 
Sandy and IvyBridge architecture. In order to get all configured performance 
groups for the current system you have to call:
likwid-perfctr -a
On group that exists on all Intel systems is the BRANCH group. So does 
likwid-perfctr -C S0:0 -g BRANCH ./a.out work?

Greetings,
Thomas

Original comment by Thomas.R...@googlemail.com on 10 Oct 2014 at 10:31

GoogleCodeExporter commented 9 years ago
Can I close this issue?

Original comment by Thomas.R...@googlemail.com on 21 Oct 2014 at 8:45