bebbo / gcc

Bebbo's gcc-6-branch for m68k-amigaos
GNU General Public License v2.0
33 stars 11 forks source link

gprof: issues with C++ code? #51

Closed mheyer32 closed 5 years ago

mheyer32 commented 5 years ago

I'm trying -pg with scummvm, but get weird results which I think are already present in the output of gprof. What seems to happen is that the wrong functions get attributed in many places. For instance, i can see a function named 'undrawMouse' in the wrong place in the callgraph (being called by main() and using up high amounts of the time), but that is impossible. ScummVM makes use of function pointers, virtual functions and templates. Is there a chance any of this is confusing the address->symbol resolve?

I made sure to use -fno-omit-frame-pointers and passed -pg and -g -ggdb to both, compiler and linker. I used the same methodology on ADoom/DoomAttack and seem to get sensible results there. It does not seem to matter if I use -O0 or use higher optimization levels.

If you need more data/files or instructions on how to reproduce with ScummVM, let me know.

bebbo commented 5 years ago

please attach your exe plus gmon.out.

mheyer32 commented 5 years ago

ScummVM.zip

The attached zip file contains a build (stripped and unstripped) of ScummVM 2.0 with SCUMM (Lucasfilm/arts), tinsel (Discworld) and SCI (Sierra) game engines included. The way I do the profiling is to run the stripped version on the device and then use gprof on gmon.out and the unstripped version of the exe.

bebbo commented 5 years ago

m68k-amigaos-gprof.exe scummvm \| grep undrawMouse \| less yields

  0.02    190.60     0.04     6649     0.00     0.00  OSystem_AmigaOS3::undrawMouse()
                0.04    0.02    6649             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00      11/606503      OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       8/62548       OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                6645             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       3/5626        OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
[331]    0.0    0.04    0.02    6649         OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.02       1/2           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       8/59198       OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   9             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   4             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/47          OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       3/16          OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       3/16          OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/4           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/1           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       3/14          OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/2           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/2           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/1           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/1           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                0.00    0.00       1/1           OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
                                   1             OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
 [331] OSystem_AmigaOS3::undrawMouse() [1661] Sci::GfxTransitions::GfxTransitions(Sci::GfxScreen*, Sci::GfxPalette*) [784] Common::updateGameGUIOptions(Common::String const&, Common::String const&)

total consumption of undrawMouse is 0.04 seconds...

mheyer32 commented 5 years ago

Hi Bebbo, I think my reporting on the time spent in undrawMouse was wrong. Please don’t focus on that. It’s the call graphs I’m worried about. UndrawMouse shows up in places where it shouldn’t. And it’s not the only function, just a prominent example. I’ll post a graph tomorrow to illustrate the problem. Thanks!

mheyer32 commented 5 years ago

scummvm.pdf

produced via: m68k-amigaos-gprof --brief scummvm gmon.out | gprof2dot.py | dot -Tpdf -oscummvm.pdf

In this example you'll see that some functions seem totally out of place, like scumm_sound_thread() being on top of everything and the audio driver calling into main(); OSystem_AmigaOS3::undrawMouse() calling into GUI:Dialog::runModal() is also not correct. But there are many other examples that don't make sense in this callgraph.

bebbo commented 5 years ago

There are static functions and if a static function is hit the best match is the name of the function in front of the static function.

That's it.

mheyer32 commented 5 years ago

Can you please elaborate why static functions pose a problem? I'd thought they'll get a profiling epilogue just like any other function... and they should still be resolvable via debug info. (If inlined functions do not get reported that is to be expected.) Or is this not how gprof operates?

When you say "the best match is the name of the function in front of the static function" is that

  1. in front in the source code
  2. in front in the binary
  3. in front in the symbol table(s) ?

Is there any way I can improve/help the resolve with compiler settings? I hoped to use gprof to improve ScummVM a bit more.

bebbo commented 5 years ago

gprof utilizes bfd to lookup symbols and debug infos aren't symbols^^

and the answer is 2.

mheyer32 commented 5 years ago

How about addr2line ? What mechanism does it use to resolve offset->function name? Could that be used instead to help gprof?

bebbo commented 5 years ago

Regarding the counters in gmon.out, you'll find a value at 0x27c0.

0000269c <__ZN16OSystem_AmigaOS311undrawMouseEv>:
    269c:   2f0d            move.l a5,-(sp)
    269e:   2a4f            movea.l sp,a5 
...
    2754:   4e5d            unlk a5
    2756:   4e75            rts 
...
...
    279c:   4e55 ff54       link.w a5,#-172
    27a0:   2f03            move.l d3,-(sp)
    27a2:   2f02            move.l d2,-(sp)
    27a4:   4eb9 002f f7bc  jsr 2ff7bc <_mcount>
    27aa:   2f39 002d 723c  move.l 2d723c <__ZN6Common13ConfigManager16kTransientDomainE>,-(sp)
    27b0:   240d            move.l a5,d2
    27b2:   0682 ffff ff56  addi.l #-170,d2
    27b8:   2f02            move.l d2,-(sp)
    27ba:   4eb9 002e 77d8  jsr 2e77d8 <__ZN6Common6StringC1EPKc>
    27c0:   4eba 1722       jsr 3ee4 <_scummvm_main+0x12bc>(pc) 

with addr2line - which uses the debug info - you'd be able to retrieve the line number:

$ m68k-amigaos-addr2line.exe -e scumm_with_symbols 0x27c0
source/base/main.cpp:97

but not the function name.

looking into the source would provide you the info that it's here:

static bool launcherDialog() {

    // Discard any command line options. Those that affect the graphics
    // mode and the others (like bootparam etc.) should not
    // blindly be passed to the first game launched from the launcher.
97:-->  ConfMan.getDomain(Common::ConfigManager::kTransientDomain)->clear();
bebbo commented 5 years ago

please try the new feature

mheyer32 commented 5 years ago

Thanks for your efforts! I'll try it tonight. Btw, addr2line can produce the function name:

matze@ubuntu-VirtualBox:~/scummvm-amigaos3$ m68k-amigaos-addr2line -e scummvm -f -C 0x2000
OSystem_AmigaOS3::copyRectToOverlay(void const*, int, int, int, int, int)
source/backends/platform/amigaos3/amigaos3-graphics-aga.cpp:766
bebbo commented 5 years ago

be warned: you'll have to use --allow-multiple-definition to link it

bebbo commented 5 years ago

should work now without --allow-multiple-definition.

gprof does not use bfd to resolve the names - too much work to change that. => let's hope these changes will be sufficient!

btw: what are youre configure options for scummvm? And which game do you use for tests? That colormap stuff seems incredible slow and is begging for speed...

mheyer32 commented 5 years ago

Initial feedback: yes, I needed -Wl,--allow-multiple-definition to avoid the linker complaining about multiple definitions. (That was before your latest patch, need to try again later tonight). The callgraphs make ALOT more sense now. Does the current fix impact performance beyond the general profiling instrumentation?

I am currently looking into optimizing the SCI engine (Sierra Games), mostly just booting up Space Quest 1 - The Sarien Encounter (Remake). There's tons of wasted perf by making runtime decisions over and over that are basically constant over the whole game - it would have been better, if they had made it so that there was a specialized engine (via subclassing) per Game instead of cramming it all into the same source and having a ton of if (getSciVersion() < a) {do X} else {do Y} everywhere.

I have quite a few of uncommited changes that speed up things a lot already. The biggest chunk next up would be to speed up drawing. Calling setPixel() for each individual pixel and then doing a _screen[y*width +x] = color is truly horrible on a 50Mhz miggie.

I configure ScummVM this way: ./configure --host=m68k-amigaos --disable-all-engines --enable-engine=scumm,scumm-7-8,sci,tinsel --disable-mt32emu --enable-release --disable-hq-scalers --with-amiga-prefix=/media/sf_Amiga/ScummVM

This will allow to play most Lucasfilm and Sierra games as well as Discworld.

You NEED to install BlazeWCP on the Amiga, otherwise the C2P conversion will be horribly slow - I'm using WriteChunkyPixels() instead of specialized C2P routines for now... changing that is on the agenda for a rainy day.

If you run in an emulator or have a MIDI equipped Amiga, I highly recommend the CAMD music output plugin! WinUAE supports to 'connect' the emulated machine straight to any Midi device known to Windows (I use MUNT-32 and the built-in general midi synthesizer)

bebbo commented 5 years ago

Does the current fix impact performance beyond the general profiling instrumentation?

No, just adding symbols for static functions.

bebbo commented 5 years ago

That was before your latest patch, need to try again later tonight)

make update && make gcc is enough and takes not that much.

mheyer32 commented 5 years ago

Not at home right now, cannot do; sorry ^-^

mheyer32 commented 5 years ago

Oh, and use make amigaos3dist -j8 to build the thing

This will produce a fully populated ScummVM under the directory that was specified via --with-amiga-prefix=/media/sf_Amiga/ScummVM

mheyer32 commented 5 years ago

WRT to crazy palette handling code. The VGA version seems to be affected mostly via this call:

GfxView::draw()
...
    if (_embeddedPal)
        // Merge view palette in...
        _palette->set(&_viewPalette, false);

...

This is now easily uncovered by gprof - big Thanks!

scummvm.pdf

bebbo commented 5 years ago

that's the old chart again!?

mheyer32 commented 5 years ago

Darrrn! Sorry... too many virtual machines and copying back and forth between shared drives involved... I'll upload a new one tonight, if you're interested.

What seems to happen is that _embeddedPal indicates that this view is providing its own palette which needs to be merged into the system palette via _palette->set(). This in turn tries to find either an empty palette slot or the best matching color via linear search(!) for each palette entry int the to-be-merged palette. And that takes a lot of time. If you just comment out above if statement, it works quite alright. EGA and Amiga versions also don't run into this issue.

I uploaded my latest optmization attempts for SCI games to: https://github.com/mheyer32/scummvm-amigaos3/tree/devel_2.0

One more question: would it be possible to use a timer based interrupt for the sampling? I understand that VBLANK is least intrusive, but the 20ms (or 16 on NTSC) is kinda big - 1ms would be great, I guess. I don't know if using a timer interrupt would take away the timer from the system (or the app that one tries to profile)... if that is the case, maybe a copper interrupt could be used. But in case of a copper interrupt, the app would need to call some gmon API to actually tie it to some screen after the program has started up, so it can't be done in a non-intrusive way just like now.

mheyer32 commented 5 years ago

(I updated the pdf link in the previous comment)

Closing.