DavidMorre / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

on SLES11/ppc64 profile is not recognized by pprof #215

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. pprof ls.prof (see attachment)
2.
3.

What is the expected output? What do you see instead?
pprof gives the error message: 'not a profile file, or old format profile
file.'

What version of the product are you using? On what operating system?
google-perftools 1.5 on kernel 2.6.27.42-0.1.2-ppc64

Please provide any additional information below.
Hmm, looks likely to be an endianness issue.  pprof has this comment:

---
  # Normal 64-bit header:  All entries are doubled in size.  The first
  # word (little-endian) should contain the real value, the second should
  # be zero.
  } elsif ($slots->get(1) != 0 ||
       $slots->get(2) < 3 ||
       $slots->get(3) != 0 ||
       $slots->get(5) != 0 ||
       $slots->get(7) != 0) {
    error("$fname: not a profile file, or old format profile file\n");

---

But the profile looks like it's been written in big-endian order.  In
particular, for
your profile, get(2) is 0, and get(3) is 3, rather than the other way around.

I'm not sure what the best solution is: to have pprof read both big-endian and
little-endian, or to make sure perftools always writes profiles in
little-endian
order.  In either case, I'll look to fix this for the next release.

Original issue reported on code.google.com by j.donn...@gmail.com on 10 Feb 2010 at 10:18

Attachments:

GoogleCodeExporter commented 9 years ago
I think I'll be changing pprof to handle big-endian correctly.  Fun with perl!

Original comment by csilv...@gmail.com on 10 Feb 2010 at 11:06

GoogleCodeExporter commented 9 years ago
OK, I just whipped up a change to pprof that *should* handle endianness 
correctly.  
But I don't have a good machine to test it on, so I'm attaching a pprof patch 
here 
that's only lightly tested.  Can you try it on your profile and see how well it 
works 
for you?

Original comment by csilv...@gmail.com on 19 Feb 2010 at 3:32

GoogleCodeExporter commented 9 years ago
I applied the patch to version 1.5 and compiled everything. Unfortunately, I 
still
get the same error message. Is this maybe the wrong patch? It mentions somewhere
__sun__ and it doesn't modify the pprof perl script..

Original comment by j.donn...@gmail.com on 19 Feb 2010 at 10:14

GoogleCodeExporter commented 9 years ago
You're right, it *is* the wrong patch!  I have too many patches running around. 
 Let 
me try again.  (And I've deleted the previous patch, so it doesn't bite anyone 
else.)

Original comment by csilv...@gmail.com on 19 Feb 2010 at 5:16

Attachments:

GoogleCodeExporter commented 9 years ago
I patched pprof:

donners@p6012:~/compile/gperftools/google-perftools-1.5/src> patch -p5 < 
~/patch 
missing header for unified diff at line 4 of patch
patching file pprof
Hunk #1 succeeded at 2603 (offset 36 lines).
Hunk #2 succeeded at 2662 (offset 36 lines).
Hunk #3 succeeded at 2814 (offset 36 lines).
Hunk #4 succeeded at 2835 (offset 36 lines).

and then I tried pprof:

donners@p6012:~/compile/gperftools/google-perftools-1.5>
LD_PRELOAD=.libs/libprofiler.so CPUPROFILE=ls.prof /bin/ls > /dev/null
donners@p6012:~/compile/gperftools/google-perftools-1.5> src/pprof --text 
/bin/sleep
../sleep.prof 
donners@p6012:~/compile/gperftools/google-perftools-1.5> src/pprof /bin/sleep
../sleep.prof 
Welcome to pprof!  For help, type 'help'.
(pprof) top20
Total: 0 samples

so pprof seems to recognize the file, but thinks it's empty.

Original comment by j.donn...@gmail.com on 23 Feb 2010 at 1:42

GoogleCodeExporter commented 9 years ago
An empty profile is expected for a proflie of ls -- it doesn't use any CPU time.

Better would be to run the perftools test suite ('make check') and see if the 
cpu-
profiler test (and hopefully all the other tests!) pass, once the patch is put 
in.  
Alternately, you could try running profiling on a binary that uses more cpu.

Original comment by csilv...@gmail.com on 23 Feb 2010 at 5:34

GoogleCodeExporter commented 9 years ago

Original comment by csilv...@gmail.com on 23 Feb 2010 at 5:34

GoogleCodeExporter commented 9 years ago
I did try to run a somewhat longer 'ls', but that evidently didn't
do the trick. I ran an MPI application with 1 task, which does 
give some output:

donners@p6012:~/DEISA/turflame/turflame-openmp3> ~/compile/gperftools/google-
perftools-1.5/src/pprof --text les3d.hybrid les3d_single.prof 
Removing fp31 from all stack traces.
Total: 1637 samples
     354  21.6%  21.6%      655  40.0% fp31
     150   9.2%  30.8%      150   9.2% 
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.16
     142   8.7%  39.5%      142   8.7% 
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.14
     125   7.6%  47.1%      125   7.6% 
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.15
     117   7.1%  54.2%      117   7.1% 
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.17
      83   5.1%  59.3%       83   5.1% $.11$.rhotempmultifgm$.OL$.1$.OL$.2
      75   4.6%  63.9%      103   6.3% GLIBC_2.3

so most things seem to work fine, except that I don't understand what routines
fp31 and GLIBC_2.3 are. I don't have a routine called fp31 in my code. There
are a few more strange routines, see the attached profile.

Another point is the profile I took when running the same application with 8 
MPI 
tasks instead of 8. The pprof script ends up in an infinite loop and keeps 
eating all 
the memory. The total profile was 40k, but pprof ran out of memory at 77GB 
after 50 
minutes on the system.

Anyhow: almost there! 

Original comment by j.donn...@gmail.com on 23 Feb 2010 at 10:28

Attachments:

GoogleCodeExporter commented 9 years ago
Cool, fp31 is taking up almost all your time, too!  My guess is it's a libc 
routine 
of some sort.  I don't think this indicates a problem with pprof.  (GLIBC_2.3 
is a 0-
size label, which does indicate a problem with pprof, kinda.  There must be 
some 
other symbol at the same address, that is actually the one taking up time here. 
 It's 
likely main or something similar.)

} Another point is the profile I took when running the same application with 8 
MPI 
} tasks instead of 8. The pprof script ends up in an infinite loop and keeps 
eating } 
} all the memory.

Can you attach the profile?  Actually, even better would be to attach the 
output of 
'pprof -raw <profile> <file>'.  I'll see if I can reproduce this.

Original comment by csilv...@gmail.com on 23 Feb 2010 at 10:43

GoogleCodeExporter commented 9 years ago
it seems that fp31 points somehow to a floating-point register on ppc (it has
fp0-fp31). Here's an excerpt of nm on the executable:

                 U _xlfWriteLDChar@@XLF_1.0
                 U _xlfWriteLDInt@@XLF_1.0
                 U _xlfWriteLDReal@@XLF_1.0
                 U _xlgetelemaddr@@XLF_1.0
                 U _xlsmpBarrier_TPO@@VERS1.3
                 U _xlsmpMaster_TPO@@VERS1.3
                 U _xlsmpParRegionSetup_TPO@@VERS1.3
                 U _xlsmpParSelf@@VERS1.3
                 U _xlsmpWSDoSetup_TPO@@VERS1.3
                 U abort@@GLIBC_2.3
0000000010053598 d add_fdes
00000000100514d0 d adds.8367
00000000100535f8 d base_from_cb_data
0000000010053538 d base_from_object
0000000010051cf0 D caltra
0000000010051c60 D cflxu
0000000010051c90 D cflxv
0000000010051cc0 D cflxw
0000000010053580 d classify_object_over_fdes
0000000010054950 b completed.6373
000000001004e000 W data_start
                 U dl_iterate_phdr@@GLIBC_2.3
0000000010054958 b dtor_idx.6375
000000001599bc70 b dwarf_reg_size_table
0000000010053310 d execute_cfa_program
0000000010053340 d execute_stack_op
00000000100535c8 d fde_mixed_encoding_compare
00000000100535b0 d fde_single_encoding_compare
00000000100534d8 d fde_unencoded_compare
0000000010051c00 D flxs_muscl
0000000000000000 a fp0
0000000000000001 a fp1
000000000000000a a fp10
000000000000000b a fp11
000000000000000c a fp12
..

it shows that fp0-fp31 are symbols with an 'absolute value' all occupying only 
1 byte
(as far as I understand this, which I don't, really). Another point are the 
symbols
that end with ..@@GLIBC_2.3 and ..@@XLF_1.0, which seem to confuse pprof.

Unfortunately, pprof --raw also runs in an infinite loop when applied to the 
8-task 
MPI job, so I attached the profile. I did attach the pprof --raw output for the
1-task MPI job.

Original comment by j.donn...@gmail.com on 24 Feb 2010 at 9:46

Attachments:

GoogleCodeExporter commented 9 years ago
As for fp0-fp31, it looks like the most probable cause is that your libc 
doesn't have 
symbol information in it.  pprof does the best it can, but it's not very good.  
If 
you have a libc with debugging info (perhaps in /usr/lib/debug/?), running the 
binary 
with that may give you better profiles.

I'm judging this based on the following info in the raw profile:
---
0x000004000066171c fp31<000000000000001f>
0x0000040000697b28 fp31<000000000000001f>
etc
---
The fact such big numbers (697b28) are mapped to something so small (1f), means 
to me 
that you basically don't have any symbols in this libc at all.

As for the 8tasks, it looks to be a corrupted profile.  I'm not sure why that 
is.  
When you run the task again, do you get an infinite loop again?  I'll put in a 
fix to 
pprof to at least die with an error message in such situations, rather than 
getting 
into an infinite (actually not, but around 2^50 iterations) loop.

I've checked and the problem with the profile doesn't seem related to 
endian-ness, so 
if you can reproduce this, it's best to open it up as a separate bug report.

Original comment by csilv...@gmail.com on 24 Feb 2010 at 7:41

GoogleCodeExporter commented 9 years ago
This should be fixed in perftools 1.6, just released.  At least, if endianness 
was the problem.  I'm closing the bug, but feel free to reopen if you think 
there's more to handle.

Original comment by csilv...@gmail.com on 5 Aug 2010 at 8:49