We-the-People-civ4col-mod / Mod

This is the repository where the mod resides.
90 stars 37 forks source link

Implement an internal / in-game benchmark tool #231

Open devolution79 opened 5 years ago

devolution79 commented 5 years ago

To support our on-going optimization efforts, we need a way to easily execute standardized benchmarks Profiling works very well (see #210) , but it lacks a number of features for this purpose and its objective differs a bit from what we want to achieve with benchmarking. Essentially, we want an automated way of allowing users to run their own benchmarks with the release build.

Requirements:

Optional features to consider:

Implementation status:

Nightinggale commented 5 years ago

Currently we have the profiling DLL, which can be used with very sleepy for profiling. What it does is it periodically reads which line the code is in, track it back to the source code and then if a line is present in 2% of the readings, it used 2% of the CPU time. It works well for single threaded applications, but it messes up if measuring a locked thread or similar.

To measure time spent by the CPU we should most likely use time.h. It measures the actual CPU cycles spent in the current thread and doesn't count in a locked thread. The problem is that it's not done automatically and we have to be creative on how to implement it.

We should make a class, which does something like this:

constructor(result*)
{
    m_iTimeStart = clock();
    m_result = result;
}
deconstructor()
{
    clock_t iTimeEnd = clock();
    double timeSpent = difftime(m_iTimeStart, iTimeEnd);
    result.add(timeSpent);
}

We can then add a macro like this:

#ifdef timerProfiler
#define PROFILE( name ) \
static profileResult* result = GC.getProfileController().getNewResultContainer(); \
Profiler instance(result);
#else
#define PROFILE( name )
#endif

Now all we have to do is add PROFILE("function name") and if it's enabled at compile time, it will start to measure time whenever this line is reached and when it goes out of scope (read: no extra code), it will inform a singleton how much time was spent. It can also increase an execution timer. The reason for the get result is that whatever singleton provides it will also store it itself in a vector. When the timer ends, all it has to do is to loop that vector and write the results in a txt file.

The results will be system (CPU) specific and as such can't be used to compare results from two different people. It can however be used to compare results of two different compilations on one computer.

We can make two macros, one for single threaded functions and one with thread locking for writing the results. Keep in mind that the thread locking will take place after measuring iTimeEnd, hence not included in the measurement unless the caller is also profiling.

Looks like there are changes in time.h from 2003 and the current one. difftime() is now called _difftime(). Not a big issue, but something to keep in mind because MSVC 2017 proposes the new version, which the makefile won't accept. Also it seems that sizeof(clock_t) is 4, but when reading the header file it looks like it is supposed to be 8. Looks like it can be fixed by adding this prior to including (untested)

typedef __int64 clock_t;
#define _CLOCK_T_DEFINED
#include <time.h>

Another thing to keep in mind is that a time of 1 isn't one CPU cycle or 1 second or anything like that. I'm not sure precisely what it is, but 10 is twice as many CPU cycles as 5. While it would be nice to know precisely what the number represent, we don't actually need to know if all we need is relative measurements. It might not even be the same for all CPUs like it could be CPU cycles / constant where the constant depends on hardware and/or version of windows.