Open GitMensch opened 1 year ago
To answer the original question: Yes that should be possible, the perfparser gives us the exact address for the call, we simply don't use that anywhere else and thus "lose" that data (the caller/callee view aggregates by source file/line after all). We would have to keep that data around for this view here, then we could show the cost there as you request.
I also agree that it would be very useful to have!
@lievenhey if possible, could you also have a look at how kcachegrind fills its disassembly view? The screenshot above doesn't look too bad - certainly better than what we have when it comes to jumps and the like. I also like that it splits up more of the content into columns one could individually hide (e.g. the hex stuff wouldn't be too interesting for me).
I guess this warrants a new GitHub issue to track such improvements to our disassembly view
The very interesting option in kcachegrind (which is also the reason that the assembly view has a source reference) is that one can order by clicking on the columns, enabling you to order by costs (this way seeing "local" hotpots very fast), click on an interesting line to mark it, then order by line (source view) / address (disassembly), then use arrow-up to position on the line before the one you've marked before.
... I move that out to a separate issue.
To answer the original question: Yes that should be possible, the perfparser gives us the exact address for the call, we simply don't use that anywhere else and thus "lose" that data (the caller/callee view aggregates by source file/line after all). We would have to keep that data around for this view here, then we could show the cost there as you request.
I also agree that it would be very useful to have!
The "related" issues of 'nice disassembly' are all solved (the request above for that i now a duplicate), but the original request is open.
Sadly I don't have enough insight into HotSpot where those specified changes have to be done, so I can't contribute by other means than testing here.
To answer the original question: Yes that should be possible, the perfparser gives us the exact address for the call, we simply don't use that anywhere else and thus "lose" that data (the caller/callee view aggregates by source file/line after all). We would have to keep that data around for this view here, then we could show the cost there as you request.
I also agree that it would be very useful to have!
Hm, is this the same as the "left side issue" in #586?
Is your feature request related to a problem? Please describe. I'm looking at disassembly that is generated from COBOL (so COBOL on the left side) where one COBOL statement may yield in multiple C functions being called. The Source View already have "cycles (self)" [= local code] and "cycles (incl.)", but that's only one inclusive cost for the complete line (which is totally right there). The Disassembly View shows where the local code actually spends its time and it shows all called C functions - but their actual cost cannot be seen as the
callq
instructions commonly have no or nearly no cost attached and "behind that" is where the "cycles (incl.)" from the left actually are (but again one cannot see in which of the called functions).Describe the solution you'd like Preferably have the cost of the called functions added to the cycles for callq, otherwise renamed the current "cycles" column to "cycles (self)" and add "cycles (incl.)".
Describe alternatives you've considered Check the called functions on their own in the caller/callee tab and see if the function currently under test is listed, but then the there's only a guess to the hint "is this relevant for that code path", while it would be directly visible in the disassembly view otherwise.
Additional context Here's a very simple recording showing this. Hotspot shows the sampling to have 11% on line 9 but the assembly has no cyles, because apart from simple single-instructions it is all done in a called function. But there's no way to see in which function. Compare with kcachegrind where each call has the amount of instructions per call and one can see that the third call to
cob_append_string
is the expensive one