Closed mheyer32 closed 5 years ago
please attach your exe plus gmon.out.
The attached zip file contains a build (stripped and unstripped) of ScummVM 2.0 with SCUMM (Lucasfilm/arts), tinsel (Discworld) and SCI (Sierra) game engines included. The way I do the profiling is to run the stripped version on the device and then use gprof on gmon.out and the unstripped version of the exe.
m68k-amigaos-gprof.exe scummvm \| grep undrawMouse \| less
yields
0.02 190.60 0.04 6649 0.00 0.00 OSystem_AmigaOS3::undrawMouse()
0.04 0.02 6649 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 11/606503 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 8/62548 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
6645 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 3/5626 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
[331] 0.0 0.04 0.02 6649 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.02 1/2 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 8/59198 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
9 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
4 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/47 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 3/16 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 3/16 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/4 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 3/14 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/2 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/2 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
0.00 0.00 1/1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
1 OSystem_AmigaOS3::undrawMouse() <cycle 1> [331]
[331] OSystem_AmigaOS3::undrawMouse() [1661] Sci::GfxTransitions::GfxTransitions(Sci::GfxScreen*, Sci::GfxPalette*) [784] Common::updateGameGUIOptions(Common::String const&, Common::String const&)
total consumption of undrawMouse
is 0.04 seconds...
Hi Bebbo, I think my reporting on the time spent in undrawMouse was wrong. Please don’t focus on that. It’s the call graphs I’m worried about. UndrawMouse shows up in places where it shouldn’t. And it’s not the only function, just a prominent example. I’ll post a graph tomorrow to illustrate the problem. Thanks!
produced via: m68k-amigaos-gprof --brief scummvm gmon.out | gprof2dot.py | dot -Tpdf -oscummvm.pdf
In this example you'll see that some functions seem totally out of place, like scumm_sound_thread() being on top of everything and the audio driver calling into main(); OSystem_AmigaOS3::undrawMouse() calling into GUI:Dialog::runModal() is also not correct. But there are many other examples that don't make sense in this callgraph.
There are static functions and if a static function is hit the best match is the name of the function in front of the static function.
That's it.
Can you please elaborate why static functions pose a problem? I'd thought they'll get a profiling epilogue just like any other function... and they should still be resolvable via debug info. (If inlined functions do not get reported that is to be expected.) Or is this not how gprof operates?
When you say "the best match is the name of the function in front of the static function" is that
Is there any way I can improve/help the resolve with compiler settings? I hoped to use gprof to improve ScummVM a bit more.
gprof utilizes bfd to lookup symbols and debug infos aren't symbols^^
and the answer is 2.
How about addr2line ? What mechanism does it use to resolve offset->function name? Could that be used instead to help gprof?
Regarding the counters in gmon.out, you'll find a value at 0x27c0
.
0000269c <__ZN16OSystem_AmigaOS311undrawMouseEv>:
269c: 2f0d move.l a5,-(sp)
269e: 2a4f movea.l sp,a5
...
2754: 4e5d unlk a5
2756: 4e75 rts
...
...
279c: 4e55 ff54 link.w a5,#-172
27a0: 2f03 move.l d3,-(sp)
27a2: 2f02 move.l d2,-(sp)
27a4: 4eb9 002f f7bc jsr 2ff7bc <_mcount>
27aa: 2f39 002d 723c move.l 2d723c <__ZN6Common13ConfigManager16kTransientDomainE>,-(sp)
27b0: 240d move.l a5,d2
27b2: 0682 ffff ff56 addi.l #-170,d2
27b8: 2f02 move.l d2,-(sp)
27ba: 4eb9 002e 77d8 jsr 2e77d8 <__ZN6Common6StringC1EPKc>
27c0: 4eba 1722 jsr 3ee4 <_scummvm_main+0x12bc>(pc)
with addr2line - which uses the debug info - you'd be able to retrieve the line number:
$ m68k-amigaos-addr2line.exe -e scumm_with_symbols 0x27c0
source/base/main.cpp:97
but not the function name.
looking into the source would provide you the info that it's here:
static bool launcherDialog() {
// Discard any command line options. Those that affect the graphics
// mode and the others (like bootparam etc.) should not
// blindly be passed to the first game launched from the launcher.
97:--> ConfMan.getDomain(Common::ConfigManager::kTransientDomain)->clear();
please try the new feature
Thanks for your efforts! I'll try it tonight. Btw, addr2line can produce the function name:
matze@ubuntu-VirtualBox:~/scummvm-amigaos3$ m68k-amigaos-addr2line -e scummvm -f -C 0x2000
OSystem_AmigaOS3::copyRectToOverlay(void const*, int, int, int, int, int)
source/backends/platform/amigaos3/amigaos3-graphics-aga.cpp:766
be warned: you'll have to use --allow-multiple-definition
to link it
should work now without --allow-multiple-definition
.
gprof does not use bfd to resolve the names - too much work to change that. => let's hope these changes will be sufficient!
btw: what are youre configure options for scummvm? And which game do you use for tests? That colormap stuff seems incredible slow and is begging for speed...
Initial feedback: yes, I needed -Wl,--allow-multiple-definition to avoid the linker complaining about multiple definitions. (That was before your latest patch, need to try again later tonight). The callgraphs make ALOT more sense now. Does the current fix impact performance beyond the general profiling instrumentation?
I am currently looking into optimizing the SCI engine (Sierra Games), mostly just booting up Space Quest 1 - The Sarien Encounter (Remake). There's tons of wasted perf by making runtime decisions over and over that are basically constant over the whole game - it would have been better, if they had made it so that there was a specialized engine (via subclassing) per Game instead of cramming it all into the same source and having a ton of if (getSciVersion() < a) {do X} else {do Y} everywhere.
I have quite a few of uncommited changes that speed up things a lot already. The biggest chunk next up would be to speed up drawing. Calling setPixel() for each individual pixel and then doing a _screen[y*width +x] = color is truly horrible on a 50Mhz miggie.
I configure ScummVM this way: ./configure --host=m68k-amigaos --disable-all-engines --enable-engine=scumm,scumm-7-8,sci,tinsel --disable-mt32emu --enable-release --disable-hq-scalers --with-amiga-prefix=/media/sf_Amiga/ScummVM
This will allow to play most Lucasfilm and Sierra games as well as Discworld.
You NEED to install BlazeWCP on the Amiga, otherwise the C2P conversion will be horribly slow - I'm using WriteChunkyPixels() instead of specialized C2P routines for now... changing that is on the agenda for a rainy day.
If you run in an emulator or have a MIDI equipped Amiga, I highly recommend the CAMD music output plugin! WinUAE supports to 'connect' the emulated machine straight to any Midi device known to Windows (I use MUNT-32 and the built-in general midi synthesizer)
Does the current fix impact performance beyond the general profiling instrumentation?
No, just adding symbols for static functions.
That was before your latest patch, need to try again later tonight)
make update && make gcc
is enough and takes not that much.
Not at home right now, cannot do; sorry ^-^
Oh, and use make amigaos3dist -j8 to build the thing
This will produce a fully populated ScummVM under the directory that was specified via --with-amiga-prefix=/media/sf_Amiga/ScummVM
WRT to crazy palette handling code. The VGA version seems to be affected mostly via this call:
GfxView::draw()
...
if (_embeddedPal)
// Merge view palette in...
_palette->set(&_viewPalette, false);
...
This is now easily uncovered by gprof - big Thanks!
that's the old chart again!?
Darrrn! Sorry... too many virtual machines and copying back and forth between shared drives involved... I'll upload a new one tonight, if you're interested.
What seems to happen is that _embeddedPal indicates that this view is providing its own palette which needs to be merged into the system palette via _palette->set(). This in turn tries to find either an empty palette slot or the best matching color via linear search(!) for each palette entry int the to-be-merged palette. And that takes a lot of time. If you just comment out above if statement, it works quite alright. EGA and Amiga versions also don't run into this issue.
I uploaded my latest optmization attempts for SCI games to: https://github.com/mheyer32/scummvm-amigaos3/tree/devel_2.0
One more question: would it be possible to use a timer based interrupt for the sampling? I understand that VBLANK is least intrusive, but the 20ms (or 16 on NTSC) is kinda big - 1ms would be great, I guess. I don't know if using a timer interrupt would take away the timer from the system (or the app that one tries to profile)... if that is the case, maybe a copper interrupt could be used. But in case of a copper interrupt, the app would need to call some gmon API to actually tie it to some screen after the program has started up, so it can't be done in a non-intrusive way just like now.
(I updated the pdf link in the previous comment)
Closing.
I'm trying -pg with scummvm, but get weird results which I think are already present in the output of gprof. What seems to happen is that the wrong functions get attributed in many places. For instance, i can see a function named 'undrawMouse' in the wrong place in the callgraph (being called by main() and using up high amounts of the time), but that is impossible. ScummVM makes use of function pointers, virtual functions and templates. Is there a chance any of this is confusing the address->symbol resolve?
I made sure to use -fno-omit-frame-pointers and passed -pg and -g -ggdb to both, compiler and linker. I used the same methodology on ADoom/DoomAttack and seem to get sensible results there. It does not seem to matter if I use -O0 or use higher optimization levels.
If you need more data/files or instructions on how to reproduce with ScummVM, let me know.