Closed GoogleCodeExporter closed 9 years ago
I think I'll be changing pprof to handle big-endian correctly. Fun with perl!
Original comment by csilv...@gmail.com
on 10 Feb 2010 at 11:06
OK, I just whipped up a change to pprof that *should* handle endianness
correctly.
But I don't have a good machine to test it on, so I'm attaching a pprof patch
here
that's only lightly tested. Can you try it on your profile and see how well it
works
for you?
Original comment by csilv...@gmail.com
on 19 Feb 2010 at 3:32
I applied the patch to version 1.5 and compiled everything. Unfortunately, I
still
get the same error message. Is this maybe the wrong patch? It mentions somewhere
__sun__ and it doesn't modify the pprof perl script..
Original comment by j.donn...@gmail.com
on 19 Feb 2010 at 10:14
You're right, it *is* the wrong patch! I have too many patches running around.
Let
me try again. (And I've deleted the previous patch, so it doesn't bite anyone
else.)
Original comment by csilv...@gmail.com
on 19 Feb 2010 at 5:16
Attachments:
I patched pprof:
donners@p6012:~/compile/gperftools/google-perftools-1.5/src> patch -p5 <
~/patch
missing header for unified diff at line 4 of patch
patching file pprof
Hunk #1 succeeded at 2603 (offset 36 lines).
Hunk #2 succeeded at 2662 (offset 36 lines).
Hunk #3 succeeded at 2814 (offset 36 lines).
Hunk #4 succeeded at 2835 (offset 36 lines).
and then I tried pprof:
donners@p6012:~/compile/gperftools/google-perftools-1.5>
LD_PRELOAD=.libs/libprofiler.so CPUPROFILE=ls.prof /bin/ls > /dev/null
donners@p6012:~/compile/gperftools/google-perftools-1.5> src/pprof --text
/bin/sleep
../sleep.prof
donners@p6012:~/compile/gperftools/google-perftools-1.5> src/pprof /bin/sleep
../sleep.prof
Welcome to pprof! For help, type 'help'.
(pprof) top20
Total: 0 samples
so pprof seems to recognize the file, but thinks it's empty.
Original comment by j.donn...@gmail.com
on 23 Feb 2010 at 1:42
An empty profile is expected for a proflie of ls -- it doesn't use any CPU time.
Better would be to run the perftools test suite ('make check') and see if the
cpu-
profiler test (and hopefully all the other tests!) pass, once the patch is put
in.
Alternately, you could try running profiling on a binary that uses more cpu.
Original comment by csilv...@gmail.com
on 23 Feb 2010 at 5:34
Original comment by csilv...@gmail.com
on 23 Feb 2010 at 5:34
I did try to run a somewhat longer 'ls', but that evidently didn't
do the trick. I ran an MPI application with 1 task, which does
give some output:
donners@p6012:~/DEISA/turflame/turflame-openmp3> ~/compile/gperftools/google-
perftools-1.5/src/pprof --text les3d.hybrid les3d_single.prof
Removing fp31 from all stack traces.
Total: 1637 samples
354 21.6% 21.6% 655 40.0% fp31
150 9.2% 30.8% 150 9.2%
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.16
142 8.7% 39.5% 142 8.7%
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.14
125 7.6% 47.1% 125 7.6%
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.15
117 7.1% 54.2% 117 7.1%
$.7$.__multigridpoissonsolver_NMOD_sweep$.OL$.5$.OL$.17
83 5.1% 59.3% 83 5.1% $.11$.rhotempmultifgm$.OL$.1$.OL$.2
75 4.6% 63.9% 103 6.3% GLIBC_2.3
so most things seem to work fine, except that I don't understand what routines
fp31 and GLIBC_2.3 are. I don't have a routine called fp31 in my code. There
are a few more strange routines, see the attached profile.
Another point is the profile I took when running the same application with 8
MPI
tasks instead of 8. The pprof script ends up in an infinite loop and keeps
eating all
the memory. The total profile was 40k, but pprof ran out of memory at 77GB
after 50
minutes on the system.
Anyhow: almost there!
Original comment by j.donn...@gmail.com
on 23 Feb 2010 at 10:28
Attachments:
Cool, fp31 is taking up almost all your time, too! My guess is it's a libc
routine
of some sort. I don't think this indicates a problem with pprof. (GLIBC_2.3
is a 0-
size label, which does indicate a problem with pprof, kinda. There must be
some
other symbol at the same address, that is actually the one taking up time here.
It's
likely main or something similar.)
} Another point is the profile I took when running the same application with 8
MPI
} tasks instead of 8. The pprof script ends up in an infinite loop and keeps
eating }
} all the memory.
Can you attach the profile? Actually, even better would be to attach the
output of
'pprof -raw <profile> <file>'. I'll see if I can reproduce this.
Original comment by csilv...@gmail.com
on 23 Feb 2010 at 10:43
it seems that fp31 points somehow to a floating-point register on ppc (it has
fp0-fp31). Here's an excerpt of nm on the executable:
U _xlfWriteLDChar@@XLF_1.0
U _xlfWriteLDInt@@XLF_1.0
U _xlfWriteLDReal@@XLF_1.0
U _xlgetelemaddr@@XLF_1.0
U _xlsmpBarrier_TPO@@VERS1.3
U _xlsmpMaster_TPO@@VERS1.3
U _xlsmpParRegionSetup_TPO@@VERS1.3
U _xlsmpParSelf@@VERS1.3
U _xlsmpWSDoSetup_TPO@@VERS1.3
U abort@@GLIBC_2.3
0000000010053598 d add_fdes
00000000100514d0 d adds.8367
00000000100535f8 d base_from_cb_data
0000000010053538 d base_from_object
0000000010051cf0 D caltra
0000000010051c60 D cflxu
0000000010051c90 D cflxv
0000000010051cc0 D cflxw
0000000010053580 d classify_object_over_fdes
0000000010054950 b completed.6373
000000001004e000 W data_start
U dl_iterate_phdr@@GLIBC_2.3
0000000010054958 b dtor_idx.6375
000000001599bc70 b dwarf_reg_size_table
0000000010053310 d execute_cfa_program
0000000010053340 d execute_stack_op
00000000100535c8 d fde_mixed_encoding_compare
00000000100535b0 d fde_single_encoding_compare
00000000100534d8 d fde_unencoded_compare
0000000010051c00 D flxs_muscl
0000000000000000 a fp0
0000000000000001 a fp1
000000000000000a a fp10
000000000000000b a fp11
000000000000000c a fp12
..
it shows that fp0-fp31 are symbols with an 'absolute value' all occupying only
1 byte
(as far as I understand this, which I don't, really). Another point are the
symbols
that end with ..@@GLIBC_2.3 and ..@@XLF_1.0, which seem to confuse pprof.
Unfortunately, pprof --raw also runs in an infinite loop when applied to the
8-task
MPI job, so I attached the profile. I did attach the pprof --raw output for the
1-task MPI job.
Original comment by j.donn...@gmail.com
on 24 Feb 2010 at 9:46
Attachments:
As for fp0-fp31, it looks like the most probable cause is that your libc
doesn't have
symbol information in it. pprof does the best it can, but it's not very good.
If
you have a libc with debugging info (perhaps in /usr/lib/debug/?), running the
binary
with that may give you better profiles.
I'm judging this based on the following info in the raw profile:
---
0x000004000066171c fp31<000000000000001f>
0x0000040000697b28 fp31<000000000000001f>
etc
---
The fact such big numbers (697b28) are mapped to something so small (1f), means
to me
that you basically don't have any symbols in this libc at all.
As for the 8tasks, it looks to be a corrupted profile. I'm not sure why that
is.
When you run the task again, do you get an infinite loop again? I'll put in a
fix to
pprof to at least die with an error message in such situations, rather than
getting
into an infinite (actually not, but around 2^50 iterations) loop.
I've checked and the problem with the profile doesn't seem related to
endian-ness, so
if you can reproduce this, it's best to open it up as a separate bug report.
Original comment by csilv...@gmail.com
on 24 Feb 2010 at 7:41
This should be fixed in perftools 1.6, just released. At least, if endianness
was the problem. I'm closing the bug, but feel free to reopen if you think
there's more to handle.
Original comment by csilv...@gmail.com
on 5 Aug 2010 at 8:49
Original issue reported on code.google.com by
j.donn...@gmail.com
on 10 Feb 2010 at 10:18Attachments: