OpenSWE1R / openswe1r

An Open-Source port of the 1999 Game "Star Wars Episode 1: Racer"
https://openswe1r.github.io/
GNU General Public License v2.0
313 stars 26 forks source link

Add profiler #111

Closed JayFoxRox closed 6 years ago

JayFoxRox commented 6 years ago

This adds a per-instruction profiler to OpenSWE1R. This makes it possible to do some fancy things:

I will write a wiki article on how to use the profiler once this is merged.


The design for this profiler is rather simplistic and follows the idea of the SetTracing() function. Calling SetProfiling(true) can be used to start profiling. All existing heat will be discarded at that point and injects a code / instruction hook. Each instruction execution starts a profiler sample which contributes to the instructions "heat". The accumulated heat can be dumped to a file using DumpProfilingHeat(<path>). If the path is NULL, the heat will be output to stdout instead. Each instructions heat is prefixed with "PROF" so it can be grep'ed easily.

An additional block hook is used to detect the BLOCK_ENTER event. The last instruction (which is still being profiled on block enter) will be marked as BLOCK_EXIT. Note that the duration of BLOCK_EXIT will be high as the VM has to switch to the new code block. On BLOCK_EXIT, we can also check the instruction which caused the exit. This allows us to check wether it's a call instruction. If that's the case, we mark the next instruction as CALLED. This can be used to find functions.

When the VM stops execution (return from uc_emu_start()) we are still profiling the last instruction. To avoid measuring the time until the next VM-entry, we have to dump out the current instruction profiling sample. This is done in a seperate commit.

I tried to keep the profiler lightweight and intend to move most analysis features into scripts. This ensures better portability as we don't add further dependencies (which we might need for other profiling features in the future). The previously mentioned call detection is only done as part of OpenSWE1R so we can figure out call targets, which we couldn't possibly collect with an external script / using static analysis.

The chosen data structure for the samples is a page directory with 0x10000 elements which point at 0x10000 elements each. So when the profiler is activated, it will require ~1MB - ~2MB of memory for that directory, but only when instruction of a new page is accessed, additional memory is required for the per-instruction data. While this could be optimized, this is an optional feature and the current design is quite simple.

There are some remaining issues:

Some of these should be turned into github issues soon.

This only works with the default Unicorn-Engine backend as the KVM backend does not support the necessary hooks (yet?).