bearjb / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add PAPI support in CPU Profiler (Patch attached) #297

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  Look for PAPI support in CPU Profiler.

What is the expected output? What do you see instead?
  Expected PAPI support in CPU Profiler
  No PAPI support in CPU Profiler

Please use labels and text to provide additional information.
  About PAPI: http://icl.cs.utk.edu/papi/ http://en.wikipedia.org/wiki/Performance_Application_Programming_Interface

  5 files patched. In .h and .cc files, are modifications are surrounded by #ifdef ENABLE_PAPI_IN_CPU_PROFILER, which is defined by ./configure.

  * Add a ./configure option:  --enable-papi-in-cpu-profiler
    It's disabled by default. If you don't enable it, there won't be any significant change, except that cpu profiler file format has changed a little.

  * CPU Profiler file format change
    Now the header has one more entry, indicating how many PAPI events are recorded. It's inserted before the padding entry. And in each sample, counts of PAPI events(consecutive long long int's) are placed after the stack entries.
    This change can be explained by pprof.patch. pprof now just ignores PAPI information. I tried to patch pprof to show the counts of events, but it seems that data structure of profile data is shared by CPU Profiler and Heap Profiler, so I didn't change anything.
    Notice that in pprof.patch, $papi_item_size should be equal to sizeof(long long int) / sizeof(int), which may be different on different machines. I've never used Perl before, so I set it to 2 (my machine is 32-bit) here. There should be a graceful way to solve this, but I failed to google it out...

  * Usage:
    As before, just do -lprofiler when compiling the program.
    When running it, set another env: PAPI_EVENT_SET, which points to a file that shows what PAPI evnets will be counted. The file may look like this:

PAPI_TOT_INS PAPI_L1_DCM

    Events can be separated by white space(0x20), \t or \n.
    Total number of events shouldn't exceed ProfileData::kPapiMaxEventsCount. I think 5 is enough in most cases.
    You can get counted number of events from output CPU Profile file, using its file format described above. The way they're presented only depends on your imagination :-)

Original issue reported on code.google.com by endlessr...@gmail.com on 26 Dec 2010 at 6:55

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry for disturbing you again.
Previously attached profiledata.cc.patch has some flaws.
Updated patch attached.

Original comment by endlessr...@gmail.com on 26 Dec 2010 at 12:05

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the patch!  I don't know anything about PAPI, so I'm wondering: what 
information does this give us that we don't get already?  Is it a replacement 
for existing cpu-usage information, or an addition?  Are you collecting all the 
useful information that PAPI provides for cpu profiling, or is there more it 
might make sense to add later?

I like that the patch is limited to only a few files, but it seems pretty 
invasive within those files.  In general, I like to minimize the number of 
#ifdefs in code.  I wonder if it's possible to keep the papi-specific stuff in 
a new file, base/papi.{h,cc}, and expose the functionality there.  The idea 
would be to define a struct to put into the cpu-profiler struct, and a function 
to fill it and a function to update it.  If PAPI exists and is configured on, 
then these structs and functions do something useful, otherwise, the struct is 
empty and the functions are noops.  Then all the PAPI-related #ifdefs just have 
to be in the papi.* files.

In addition to minimizing ifdefs, it's nice to minimize configure options.  Is 
there any reason not to turn on PAPI if it exists on a system?  Is it very 
expensive to compute?  I guess one issue is that PAPI needs a driver, at least 
on linux (as I understand it), so we can't do that testing at build time, we 
have to do it at program-run time, since the program may run on a different 
machine than it's built on.  So some of the papi tests would have to be done at 
runtime.  But maybe that's ok; we already do something similar to see if the 
current machine supports thread-local storage.

Original comment by csilv...@gmail.com on 27 Dec 2010 at 9:48

GoogleCodeExporter commented 9 years ago
1.  PAPI is an API for counting low-level CPU events, like total instruction 
executed, L1/L2 cache misses, incorrectly predicted conditional branches. It 
uses hardware counters on modern CPU, so it performs differently on different 
CPUs. It's helpful when we need to point out why some functions are too slow: 
it is too long(has too many instructions)? It doesn't use CPU cache 
effectively(too many cache misses)? Or it has too many conditional 
branches(which will slow CPU speed as well)?
    So, it's only an ADDITION, not replacement.

2.  PAPI can monitor dozens of CPU events, but due to hardware resource 
limits(number of hardware counters that can be used by PAPI), only a few 
events(on my core 2 duo P8600, at most 4) can be monitored at the same time.
    So I leave the choice of events to users. Users can add all events they want to monitor to a file and set path to the file as environment variable PAPI_EVENT_SET. When the program is running, it will scan the file, and then try to add all events. But how many events will be added successfully depends, and successfully added events will be printed to stderr.
    In this aspect, I'm not collecting all useful information that PAPI provides, but I am able to. And you can get different information you need by only editing the PAPI_EVENT_SET file, without rebuilding your program or google-perftools.

3.  Minimizing #ifdefs and moving all PAPI related stuff to independent {.h 
.cc} files is graceful. I totally agree. I can do this if you want this PAPI 
feature.

4.  PAPI function calls are very fast. I can't think of a reason not to turn on 
PAPI if it exists.
    PAPI does need kernel support. Default Ubuntu kernel doesn't support it, but default Fedora kernel does. If you want to use PAPI on Ubuntu, you have to build a custom kernel, and then build PAPI library. But on Fedora, you only need to "yum install papi-devel".
    Even though, I think we can still decide it during ./configure by looking for headers and libs of PAPI. The papi tests can be done at runtime, when initialing CPU Profiler.

Original comment by endlessr...@gmail.com on 28 Dec 2010 at 5:02

GoogleCodeExporter commented 9 years ago
Thanks, this is all very useful.  I know at google we do some cpu-level 
profiling of this sort; I wonder if we use PAPI for that, and if so, how it 
fits in with the cpuprofiler (which, no surprise, we use a lot internally).  So 
let me do a bit more research when I get back from vacation next week.  If we 
do end up using this patch, it would be great to refactor it, but it may be 
best to figure out exactly what the plan is, before putting more work into it.

Speaking of which, can you sign our CLA?
   http://code.google.com/legal/individual-cla-v1.0.html
We'll need that to accept the patch.

Original comment by csilv...@gmail.com on 28 Dec 2010 at 6:09

GoogleCodeExporter commented 9 years ago
Sure, enjoy your vacation :-)
I also thought google do this kind of CPU profiling well enough, I worked out 
this patch mainly because a course project needs it.
I'll wait for your plan.

Original comment by endlessr...@gmail.com on 29 Dec 2010 at 3:30

GoogleCodeExporter commented 9 years ago
Craig: this might get the cpuprofiler working on the perfctr patched kernels. 
(http://code.google.com/p/google-perftools/issues/detail?id=227). I'll try out 
the patches and update this thread. 

Thanks endlessroad1991. 

Original comment by ashwi...@gmail.com on 29 Dec 2010 at 4:27

GoogleCodeExporter commented 9 years ago
ashwinks:
I'm afraid I have to disappoint you, because this patch only add some 
information to profile data, it won't get CPU Profiler working if it didn't 
work properly.
But it has something to do with perfctr. I mentioned above that PAPI does need 
kernel support, that means it needs perfctr patch applied to kernel. Fedora 
default kernel has the patch, Ubuntu doesn't. I'm using Fedora 14, it has 
perfctr patch, and CPU Profiler works normally no matter if PAPI feature is 
turned on.
Anyway, good luck :-)

Original comment by endlessr...@gmail.com on 29 Dec 2010 at 4:50

GoogleCodeExporter commented 9 years ago
Which is the kernel version in Fedora 14? I am using a RHEL 4 system (Linux 
2.6.9-78.ELsmp with perfctr patch. RHEL 4.0.7 64-bit) with a  patched kernel. 
The problem I am facing is that the timer may not be getting attached to the 
process. 

Original comment by ashwi...@gmail.com on 30 Dec 2010 at 3:59

GoogleCodeExporter commented 9 years ago
kernel.i686                        2.6.35.10-74.fc14         @updates
papi.i686                          4.1.0-1.fc14              @fedora            
papi-devel.i686                    4.1.0-1.fc14              @fedora
gcc.i686                           4.5.1-4.fc14              @fedora            

Original comment by endlessr...@gmail.com on 30 Dec 2010 at 5:11

GoogleCodeExporter commented 9 years ago
It turns out that within google, we access the PMU data directly, rather than 
going through PAPI.  So we don't have any code we can share with this.  A good 
next step would be to extract stuff out into a separate file, like I described 
before, and submit the patch again.  Sound good?

Original comment by csilv...@gmail.com on 5 Jan 2011 at 9:26

GoogleCodeExporter commented 9 years ago
Thanks. But now I'm busy preparing for final exams :-(
I will refactor the patch as you said within one week.

Original comment by endlessr...@gmail.com on 6 Jan 2011 at 8:32

GoogleCodeExporter commented 9 years ago
See README for everything.

By the way, I'm using Ubuntu 10.04 for your convenience.
Goobuntu is based on Ubuntu LTS, isn't it?

In previous comments I said default Ubuntu kernel doesn't support PAPI.
I was wrong. Kernel 2.6.32 is enough for PAPI, no need for perfctr patches.

So maybe you can test PAPI feature on your machine as in README.

Original comment by endlessr...@gmail.com on 9 Jan 2011 at 6:34

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by endlessr...@gmail.com on 9 Jan 2011 at 12:17

Attachments:

GoogleCodeExporter commented 9 years ago
This is great!  I'm a bit overwhelmed at the moment catching up from the 
holidays (and starting all the stuff that I said, "It's too close to the 
holidays to start this up now, I'll wait until 2011"...), but I'll take a close 
look at this when I get a chance.

I took a cursory look now.  Overall, I like how it's coming together.  A few 
comments:

1) I'm not sure what to make of the TODOs in pprof, with the size of the data 
possibly changing.  Also, I'm not sure if using Math::BigInt is consistent with 
the rest of pprof (which I only kinda barely understand, not being a perl 
expert myself).  Shouldn't we make sure that all numeric values in the profile 
are the same size -- so on a 64-bit system the papi data will be 64 bits, and 
on a 32-bit system the papi data will be 32 bits?  I already have complicated 
code to check bit-size and endianness of the profile file.  Is there some way 
to leverage that?

We do see people create profiles on a 32-bit system and then try to analyze 
them on a 64-bit system (and vice versa), so it would be nice if papi worked in 
that case.

2) I noticed that even in the 'no papi support' case, papi.cc creates an empty 
file every time.  I imagine almost all our users will not be papi-enabled, and 
will have to pay the cost of this extra zero-length file every time they run 
the cpu profile.  It would be nice to avoid that cost if possible.  Is there 
any way to get pprof to behave nicely without the need to create that empty 
file?  Or can this file be combined with the existing profile file in some way? 
 I haven't looked into this closely enough to know if that's way off base or 
not.

Thanks for doing this!  I think it will be very nice to get the improved 
profiler info.

Original comment by csilv...@gmail.com on 10 Jan 2011 at 1:39

GoogleCodeExporter commented 9 years ago
Updated patch.

1) Remove TODOs in pprof
    It turns out that PAPI data is long long,
        so they're both 64bits on both 32-bit and 64-bit machines.
    Thus, I remove these TODOS in the updated patch.
    Also, 32/64-bit and little/big-endian are all judged from the profile file,
        not the machine running pprof, so we can't use $Config->{intsize} or etc.
    You may want to check if my patch to pprof is correct.
        I've only checked it on my laptop (32-bit and little endian).

2) papi.cc doesn't create an empty file any more
    It's useful only when PAPI is not supported,
        but user still specify PAPI_EVENT_SET for his program and pprof.
        In this situation, pprof results will be messed up.
        However, if you don't specify PAPI_EVENT_SET, there will be no problem.
    So creating the empty file is not necessary.
    But never specify PAPI_EVENT_SET for pprof, when PAPI is not supported.

Original comment by endlessr...@gmail.com on 10 Jan 2011 at 7:32

Attachments:

GoogleCodeExporter commented 9 years ago
Trying to port gperftools to Windows.
Any suggestions?

Original comment by endlessr...@gmail.com on 16 Jan 2011 at 2:44

GoogleCodeExporter commented 9 years ago
Do you mean porting perftools in general, or just the papi part?  If in 
general, then a lot of perftools already runs on windows.  What part are you 
looking to port?   A better place to talk about it than here, is probably 
google-perftools@googlegroups.com.

Original comment by csilv...@gmail.com on 20 Jan 2011 at 1:46

GoogleCodeExporter commented 9 years ago
I finally had a few moments to look at this today.  I'm sorry it's taken me a 
couple of weeks to get back to you.

I'm concentrating on pprof for the moment.  I don't like the fact "If PAPI 
feature is enabled, you should set PAPI_EVNET_SET to the correct file used when 
profiling your program, or all pprof results would be wrong and messy."  I 
think the event set should be baked into the profile when it's created.

This should be easy to do, given the CPU profiler header:
---
  # Read header.  The current header version is a 5-element structure
  # containing:
  #   0: header count (always 0)
  #   1: header "words" (after this one: 3)
  #   2: format version (0)
  #   3: sampling period (usec)
  #   4: unused padding (always 0)
---
With PAPI, just add a new field, between 3 and 4, which is the size of 
PAPI_EVENT_SET, followed by the names of each event (NUL-terminated, or 
\n-terminated, or whatever).  Since 'size' here is in blocks (4 or 8 bytes, 
depending), you'll need to do a bit of work to calculate the size (what is put 
in (1)).  Keep the unused padding at the end of the header.

(Note I'm suggesting this though I'm not the primary pprof author, so this 
change will still have to go through him.  I think it's more likely to get 
through than the current version, though.)

I also am not fond of the fact that the internal representation of the profile 
depends on the profile type: there are too many 'if cpu' tests in there.  I'd 
either make them all the same -- that is, all be a hash with the key of 'n' for 
the value -- or put the papi info in a separate data structure.  I don't know 
how feasible the latter is.

Finally, I don't know perl very well, so I don't know much about this BigInt 
package.  Is it standard -- do all perls have it?  What version of perl was it 
added in?  It seems like bigint might be overkill: is there another method we 
can use to represent 64-bit numbers in 32-bit perl?  (PAPI values are always 64 
bits, even on 32-bit processors, is that true?)

Whatever solution is used here, should be used for the existing code that tries 
to read 64-bit values, and just dies if it does so with a 32-bit perl.  That 
could be a separate patch, though.  You could add a TODO about that in the 
current patch (see elsewhere in the file for TODO format).

A more minor note: in configure.ac, you don't need to have 
ENABLE_PAPI_IN_CPU_PROFILER, I'm pretty sure.  Just do the AC_SUBST(PAPI_LIBS) 
always (it will be "" when papi isn't found), and just add $(PAPI_LIBS) 
unconditionally to the link line when appropriate.

Original comment by csilv...@gmail.com on 25 Jan 2011 at 4:45

GoogleCodeExporter commented 9 years ago
1. Bake PAPI event names into profile file
    I did this way because I wanted to keep pprof backward-compatible. Current pprof can still handle previous profile files.
    In my opinion, CpuProfileStream uses a cache mechanism and from outside the class we can only call get() to get a slot. So operations like reading a line or reading a string become inconvenient(still can be done, anyway).
    An alternate solution is to save papi event id to profile header(getting intergers from profile file is easy). Then we have to translate event code to event name. This translation work is really simple(PAPI function: PAPI_event_code_to_name() does it), but it needs a separate C program so we can do the translation work in pprof.
    Which do you prefer?

2. Too many 'if cpu' tests
    I think making them all the same(key 'n' for the value) is better. PAPI info is just integers, and has no need for further extending, so why bother to put it in a separate data structure?

3. BigInt
    Math::BigInt is one of perl's core modules and is old enough.
    I didn't find detailed history of bigint in perl. According to old documents of Perl 5.8(http://perldoc.perl.org/5.8.8/Math/BigInt.html), Perl 5.8 has Math::BigInt. And http://perl.active-venture.com/lib/Math/BigInt.html implies Perl 5.6 already had Math::BigInt.
    It is overkill, but I googled a lot and didn't find a third way except BigInt and pack.
    It's supported on both 32-bit and 64-bit Perl, and can solve the TODO you left in pprof.
    pprof does only a little big int related calculations, therefore efficiency isn't a problem.
    So what do you think?

4. Remove ENABLE_PAPI_IN_CPU_PROFILER from configura.ac and makefile.am
    Thanks for the note, a lot cleaner :-)

Original comment by endlessr...@gmail.com on 26 Jan 2011 at 10:10

GoogleCodeExporter commented 9 years ago
} I did this way because I wanted to keep pprof backward-compatible. Current 
pprof can still handle previous profile files.

My suggestion is both forward- and backward-compatible (note the part where you 
modify the header-size field in the cpu profile).  pprof was designed to allow 
for extensible header information, thankfully.

You are right that storing strings in the cpu profile isn't really a natural 
fit for the block-focused reader (and the fact the header size is indicated in 
terms of blocks as well, so you may have to do some padding).  But it will 
certainly end up being much cleaner than having to specify the same values 
twice, through an environment variable.

} PAPI info is just integers, and has no need for further extending, so why
} bother to put it in a separate data structure?

Well, the data structures are relatively happy now, being a map from a key to 
an integer value.  You're proposing changing it to a map from a key to another 
map, one of whose elements is an integer value (with a rather arbitrary name).  
That adds comprehension complexity, and makes me somewhat sad.  If you can 
support a separate PAPI map that doesn't add as much complexity (in order to 
pass it around), then that will be better.

} Math::BigInt is one of perl's core modules and is old enough.

OK, sounds good.  I'm happy with that part.

Original comment by csilv...@gmail.com on 26 Jan 2011 at 10:36

GoogleCodeExporter commented 9 years ago
Sample data was designed very simple, only an integer. Many places in pprof 
uses this convenience directly, like plus/minus/get/set. If papi data is 
inserted, I think it's better to extract sample data to a separate class, and 
replace all direct accesses with method of the new sample data class.

Also, inserting event names to profile header makes profile header difficult to 
extend in the future. Event names can be inserted before/after map, which is 
also pure text.

Original comment by endlessr...@gmail.com on 29 Jan 2011 at 12:53

GoogleCodeExporter commented 9 years ago
Event names are inserted before /proc/map. The only addition to profile header 
is one slot: PAPI event count. 

Now ProfileData is an independent class. Origin count is accessed by 
$ProfileData->{n}. PAPI data is accessed by $ProfileData->{papi}.

$ProfileData->badd() does addition for both {n} and {papi}, ->bsub() does 
subtraction.

GetEntry() is renamed to GetEntryCount(), because it does return a count, but 
now an Entry is a $ProfileData.

Also, when read 64-bit profile on 32-bit perl(no pack available), use 
Math::BigInt. In fact, Math::BigInt is easy to use. Only difference is when you 
create the variable, use
    my $a = Math::BigInt->new('123');
Other operations like add or print is exactly like before:
    $a += 2;
    print STDERR $a;

I've tested the code on 32-bit and 64-bit Ubuntu: read 32-bit profile on 64-bit 
Ubuntu, and read 64-bit profile on 32-bit Ubuntu. All work well.

ProfileData has changed, so I tested CPU profile and heap profile. Also works 
well.

Original comment by endlessr...@gmail.com on 31 Jan 2011 at 2:15

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the new patch!  The design you describe sounds good to me.  It can 
be tough for patches to make it past the pprof gatekeepers, but I'm pretty 
hopeful about this one.

We're trying to get a kernel set up locally that supports papi, so we can try 
the patch out for a little while before committing it to the SVN tree.  This 
means this patch may not make it into the next release (which I'm hoping to do 
in the next week or so), but we're making steady progress on it.

Original comment by csilv...@gmail.com on 1 Feb 2011 at 7:14

GoogleCodeExporter commented 9 years ago
Refer to Linux section of file INSTALL in PAPI source directory.  In general, 
kernel 2.6.32 or newer doesn't need any patch to support PAPI.  But for kernel 
prior to 2.6.32, perfctr patches are needed.  My laptop is Core 2 Duo P8600, 
and Ubuntu 10.04 default kernel supports PAPI "out of the box".

Thanks for being so patient.

Original comment by endlessr...@gmail.com on 1 Feb 2011 at 7:56

GoogleCodeExporter commented 9 years ago
I should be thanking *you* for being patient!  We'll get to this patch 
eventually...

Original comment by csilv...@gmail.com on 3 Feb 2011 at 7:25

GoogleCodeExporter commented 9 years ago
Hi.  Craig asked me to take a look at this patch (months ago, so the
blame's on me, not him :)

First of all, I don't understand the use-case here.  Adding event
counts to aggregated time-based stack samples doesn't make much sense
in my mind.  Please elaborate.

Suppose we are counting cache misses and we have two functions:
MissesCache and HitsCache.  And each of those functions consume ~50% of
the CPU profile.  If these two functions are called alternatingly,
then HitsCache would also be blamed for cache misses.  Attached is a
microbenchmark (compiled with g++ -O1 -fno-omit-frame-pointer)
illustrating this (and verified using perf_events).  Also attached is
the pprof gv output from collecting PAPI_L1_DCM.

Documentation:
- The README file doesn't mention how to build or use the PAPI
  features.

configure:
- The enable-papi-in-cpu-profiler option doesn't seem to be optional.
  If PAPI is found, it's enabled by default.
- I'm autoconf/automake illiterate, but shouldn't this patch include
  the updated configure script as well?

make:
- I got this failure (but I could work around it to do my own testing):
  make: *** No rule to make target `-lpapi', needed by
  `profiler2_unittest'.  Stop.

pprof:
- I'm completely Perl-illiterate, so I'm going to punt on reviewing pprof.
- I did *use* pprof, and I get lots of "nan%" when using it on
  profiles collected with or without PAPI_EVENT_SET (or even
  with papi-in-cpu-profiler disabled)

Style:
- Don't worry about fixing style comments until the usefulness and
  functionality comments are addressed.
- Many of your comments are redundant.  Only comment when it explains
  something not obvious.  E.g.,
  // Dump PAPI event names
  DumpPAPIEventNames(out_);
- Run cpplint.
  http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py
- Per the style guide, curly braces are optional.  But the code you're
  modifying seems to always use them.  Please do the same for
  consistency.

Original comment by tipp.mos...@gmail.com on 19 May 2011 at 12:40

Attachments:

GoogleCodeExporter commented 9 years ago
Tipp, thanks for taking the time to look at this!

Original comment by csilv...@gmail.com on 19 May 2011 at 12:43

GoogleCodeExporter commented 9 years ago
After talking things over with folks inside google, we've decided that there 
are a lot of different ways to profile, and we want to focus on just one rather 
than complexifying the profiler and pprof.  So I think it's great to have this 
patch around for anyone who is interested, and I encourage you to keep a fork 
of the profiler around with this stuff in it if you want, but I'm going to 
close this WillNotUse.

Original comment by csilv...@gmail.com on 12 Oct 2011 at 10:08