Closed milianw closed 6 years ago
Yes, that's exactly the case. I agree it's not pretty but the motivation was performance - profiling applications that have tens of millions of allocations tends to result in slow population of stack tree and related structures if symbol resolution is involved. I'll think about this one...
The merging can happen at analysis time, and the resolution of the instruction pointer address to symbol name can be cached. Furthermore, properly merging can actually result in a reduction in overhead since the tree will be smaller, which can bring considerable memory reductions (I've seen this in heaptrack).
I was wrong here... return address isn't used but a 'symbol ID' which is just a symbol offset. Just fixed symbol ID enumeration so this should be working in the next release.
Take this example code:
Compile it with
cl.exe /Zi test.cpp
and trace it with MTuner. Then inspect the stack tree and notice how seemingly equivalent stack frames are not merged. Maybe it's because the instruction pointer address is used for the merging, instead of the user-visible symbol name?