Add profiler - Githubissues

This adds a per-instruction profiler to OpenSWE1R. This makes it possible to do some fancy things:

We can measure which functions are slow and should be re-coded in C. A script for this is provided.
We can create profiling information and look at the instructions which have been encountered. By later diff'ing this with another run (= finding instructions which were now ran, but not before), we can figure out which functions are responsible for certain effects (if they are caused by a different code paths instead of swapping values). A script for this is provided, although no good results were collected using this script yet.
It could be used to detect slow instructions so we could possibly improve TCG / Unicorn-Engine. This has low priority. A script should be written to use something like Capstone or objdump to annotate the profiling information with disassembly.

I will write a wiki article on how to use the profiler once this is merged.

The design for this profiler is rather simplistic and follows the idea of the SetTracing() function. Calling SetProfiling(true) can be used to start profiling. All existing heat will be discarded at that point and injects a code / instruction hook. Each instruction execution starts a profiler sample which contributes to the instructions "heat". The accumulated heat can be dumped to a file using DumpProfilingHeat(<path>). If the path is NULL, the heat will be output to stdout instead. Each instructions heat is prefixed with "PROF" so it can be grep'ed easily.

An additional block hook is used to detect the BLOCK_ENTER event. The last instruction (which is still being profiled on block enter) will be marked as BLOCK_EXIT. Note that the duration of BLOCK_EXIT will be high as the VM has to switch to the new code block. On BLOCK_EXIT, we can also check the instruction which caused the exit. This allows us to check wether it's a call instruction. If that's the case, we mark the next instruction as CALLED. This can be used to find functions.

When the VM stops execution (return from uc_emu_start()) we are still profiling the last instruction. To avoid measuring the time until the next VM-entry, we have to dump out the current instruction profiling sample. This is done in a seperate commit.

I tried to keep the profiler lightweight and intend to move most analysis features into scripts. This ensures better portability as we don't add further dependencies (which we might need for other profiling features in the future). The previously mentioned call detection is only done as part of OpenSWE1R so we can figure out call targets, which we couldn't possibly collect with an external script / using static analysis.

The chosen data structure for the samples is a page directory with 0x10000 elements which point at 0x10000 elements each. So when the profiler is activated, it will require ~1MB - ~2MB of memory for that directory, but only when instruction of a new page is accessed, additional memory is required for the per-instruction data. While this could be optimized, this is an optional feature and the current design is quite simple.

There are some remaining issues:

Tracing and Profiling at the same time was not tested, but they will probably interfere
The function-profiler script does only know where a function starts (because it was CALLED). It does not know when the function ends, so it assumes that one function ends when the next function starts.
The function-profiler assumes the first instruction of a function was only called as often as the function was called. This assumption might be broken when a function starts with a loop.
All durations given by the profiler and script are in nano seconds, but this is not shown anywhere.
Exits from instructions to the host are not being marked. Additional flags could be used to mark those exits (such as thread timeslicing).
As the Unicorn-Engine hook will impact the emulation, the profiler results should be treated skeptically.
Unhooking the profiler (SetProfiling(false)) will often crash unicorn on re-entry. A workaround should be created which defers unhooking to a safer time.

Some of these should be turned into github issues soon.

This only works with the default Unicorn-Engine backend as the KVM backend does not support the necessary hooks (yet?).

OpenSWE1R / openswe1r

Add profiler #111