Open GoogleCodeExporter opened 9 years ago
I'm afraid this will be impossible to debug without a testcase to look at. I
don't know offhand what might be causing this, though perhaps other people do.
I'm closing this CannotReproduce, but feel free to reopen if you can come up
with a test case that displays the problem.
Original comment by csilv...@gmail.com
on 9 Dec 2011 at 7:12
I understand, it's impossible to reproduce. I will try to create one, looks
like I have some glues, which might help me to understand better how internally
works profiler and create a test-case.
One thing that I realized it's connected with recursive functions I have in my
application. Looks like recursive calls increase the number of samples
artificially, which eventually bring such results. Can you please confirm that
my speculations below are correct?
If we have recursive calls than in single stack sample we will have more than
one appearance of the same function. How will profiler handle this case?
For example in call stack sample #3 below it will count 3 for function c() or
just 1 ?
Is the table below correct for this scenario?
-----------------------------------------------------
c()
c() c()
c() c() c()
b() b() b()
a() a() a()
main() main() main()
------ ------- -------
sample 1 sample 2 sample 3 ...
function | # of appearance in the stack
---------------------------------------
main | 3
a | 3
b | 3
c | 6
Thanks.
Original comment by pavl...@gmail.com
on 9 Dec 2011 at 9:51
Yes, it's likely recursive calls are involved, now that you mention it.
I'm not exactly sure how the profiler handles recursive calls actually, but if
I were to give my best guess, it would be the same as the scenario you describe.
Original comment by csilv...@gmail.com
on 9 Dec 2011 at 10:01
Yes, I reproduced the issue with a small recursive test-case.
So is this a bug or feature?
As a user, I'd prefer to get 100% for main().
I will post the test-case during weekend.
Original comment by pavl...@gmail.com
on 9 Dec 2011 at 11:08
I would say it's unfortunate but there may be no better solution. Suppose you
have mutually recursive functions, and a stacktrace that looks like
A -> B -> C -> D -> E -> C -> D -> E -> C -> D
How much time do you attribute to each of the functions here? How do you
capture the fact that the code is "in" C and D more than it's "in" E?
Original comment by csilv...@gmail.com
on 9 Dec 2011 at 11:31
This is really hard to represent, there may be no easy solution, but I would
approach this from timeline point, the first function which starts the cycle is
C, so whatever started from C is in C.
I tried my test case with gprof, which reported what I expected, 100% for
main(). I know that they use different profiling techniques, but results can be
represented in similar way.
For mutually recursive functions they use an abstraction called <cycle # as a
whole>
http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html
Another possible solution might be adjusting (+1) number of samples for rest of
functions in stacktrace. I think this approach is more logical, because all
those functions are in stacktrace, when we make a recursive call, so in some
sense they are spending CPU time during recursive calls.
Test case also included.
Original comment by pavl...@gmail.com
on 12 Dec 2011 at 11:25
Attachments:
Original comment by csilv...@gmail.com
on 13 Dec 2011 at 12:41
Can you clarify a bit what the test case is showing? The pprof profile you
show in the attached file looks very reasonable to me: triangle taking up 98+%
of the cumulative CPU, and main taking 100%. Is it supposed to be showing a
problem in some way?
I just want to make sure there's a way to reproduce the problem, to give
something concrete to talk about. Once that's done, I'll ask the pprof authors
what they think.
Original comment by csilv...@gmail.com
on 13 Dec 2011 at 12:44
Sure, I will try to clarify.
I attached two kcachegrind screenshots, both screenshots and the text
output(in test-case) were generated from the same profile.
One of screenshots shows relative time and second one absolute time. As you can
see on screenshots inclusive time for main() is 0.15%, and self time is 0.00%.
If I understand inclusive time correctly it suppose to be 100% for main().
Also, how kcachegrind results correlate to text output? there is no 0.15% in
text output at all.
Original comment by pavl...@gmail.com
on 13 Dec 2011 at 5:44
Attachments:
Yes, I'm confused by this too. Can you run ./pprof --raw <your binary> <your
profile file> and attach the result here? I'll see if I can reproduce what
you're seeing.
Original comment by csilv...@gmail.com
on 14 Dec 2011 at 2:01
raw output from pprof attached.
Original comment by pavl...@gmail.com
on 14 Dec 2011 at 7:42
Attachments:
Thanks! That's very helpful. I've passed this along to the pprof authors to
get their feedback on what we might do about this.
Original comment by csilv...@gmail.com
on 14 Dec 2011 at 11:14
I asked and they said that this is a known issue, but doesn't bother them
enough to do anything about it. "If there was a suitably non-invasive change
to fix it, we'd be willing to take a look," was their general feeling. So,
patches accepted! Though no promises on them...
Original comment by csilv...@gmail.com
on 21 Dec 2011 at 7:10
I think I found the source of issue. Looks like it's size of stacktrace.
Currently it's defined as 64 in source code. In test-case it's much more that
64, for none-recursive functions it might be OK (we cut the stacktrace and lose
details), but with recursive functions we get this weird result.
I changed it to 1024 and results are as expected (100% for main()), except
callgrind output, it shows 53% for main().
1. Could you please confirm that my guesses are correct, and we can fix the
callgrind output to get valid results?
2. Also, the text output doesn't have header, It will be useful to have a
header for it.
New results are attached (with stacktrace =1024)
Original comment by pavl...@gmail.com
on 6 Jan 2012 at 10:58
Attachments:
I don't know anything about callgrind -- I think that entire bit of code was
provided by a user -- so I can't say why it's still at 53% rather than 100%.
I'm glad the text output is at 100% now! Your theory about the stacktrace
being cut off is very reasonable.
One reason we limit the stacktrace size is that that we need to store that
array on the stack, and stack size can be limited (especially in threaded
apps), so we have to limit how large we make the array. 64 was the sweet spot
we found between taking up space and catching full stack information in most
cases. I'm not sure we can safely change it. Of course, you're welcome to
change it for your own use, and I'm glad that helps you.
Original comment by csilv...@gmail.com
on 7 Jan 2012 at 2:38
Is this something that you would like to eventually see addressed or can this
issue be closed off now? Seems to be a pretty pathological case.
Original comment by chapp...@gmail.com
on 7 Feb 2012 at 3:21
I think the main issue solved by increasing stacktrace size. But the outputs
are not consistent; --callgrind is different from others (54% vs 100% for
main). Ideally I would expect consistent outputs for all pprof output modes,
and a header for pprof --test output.
Original comment by pavl...@gmail.com
on 7 Feb 2012 at 6:46
1. I guess my main concern is that there will potentially be another user in
the future where a stack size of 100 isn't sufficient, etc. Plus, there seems
to be some performance and memory implications to tuning the stacktrace size
that would need a careful analysis. Perhaps this could be best implemented as a
tunable build setting where the default would remain as it is now but for those
encountering this issue it could be tweaked?
2. Is the --callgrind issue a general issue or is it isolated to this
particular case.
3. I would be happy to take in a patch for header issue if you want to draft
one up.
Original comment by chapp...@gmail.com
on 2 Mar 2012 at 4:43
Just pinging for further update/comment. I am actively working on patch work
over the next few days and would like to clear up as many issues as possible.
Original comment by chapp...@gmail.com
on 21 Apr 2012 at 6:17
1. Agree, stacktrace size will add extra overhead, probably 64 is best default
value. Making it as a tunable build setting is a good idea.
2. The --callgrind issue is general for any recursive case, where stack size
exceeds 64. For recursive algorithms it's pretty common. In that case you will
get weird, confusing results, like inclusive time for main() is 30%.
3. I will give you a patch for header.
Original comment by pavl...@gmail.com
on 23 Apr 2012 at 4:03
Any luck with that patch?
Original comment by chapp...@gmail.com
on 11 Mar 2013 at 2:21
Original comment by chapp...@gmail.com
on 11 Mar 2013 at 2:21
I've been doing some profiling using files and not functions, where the
recursivity problem is felt even more (it's common that functions in one file
call functions in other files that call functions in the first file). When this
happens, the number of calls in the arcs become confusing and provides little
useful information.
To circumvent this I experimented changing in pprof the way that the values in
the arcs are counted, so that if the same element appears further ahead in a
callstack the arc from that element to the next is not increased. With this,
looking at the arcs exiting the boxes in the call graph you see how many
samples exist that are finished somewhere else.
Original comment by vitor.s....@gmail.com
on 23 Dec 2014 at 5:50
"confusing graphs" is not same as inconsistent output. I think that recent
changes to cpu profiler might actually fixed this ticket's problem.
Regarding your improvements, consider posting them somehow (but not on this
ticket please).
Original comment by alkondratenko
on 23 Dec 2014 at 6:44
so I posted here because some old messages on this issue had something I was
dealing with but, indeed, confusing is not the same as inconsistent.
I do see inconsistent counting of samples in some cases on cpu profiling. After
figuring out if it's a real issue, and not some mistake of mine, I'll post it
as a new issue.
Original comment by vitor.s....@gmail.com
on 23 Dec 2014 at 7:02
Original issue reported on code.google.com by
pavl...@gmail.com
on 9 Dec 2011 at 5:00