Candle-Fire / umbra

1 stars 3 forks source link

Add a profiler and instrumentation #30

Open TheCurle opened 5 months ago

TheCurle commented 5 months ago

This PR adds three bits of infrastructure surrounding the usage of a Profiler:

Precise Timer

Currently, the Timer is a set of static ints, updated every frame by the ShadowApplication.
For the purposes of profiling, this is not useful (as we may want to profile many things that happen within a frame, with high precision).

SH::Timer is now a struct with a bunch of static methods, so that you can use it both as an RAII timer (create an instance at the start of a method to start the timer, and it stops when it goes out of scope. Call elapsed() to get the amount of nanoseconds that have elapsed since the timer started.

The timer struct itself uses std::chrono to perform timing, but the profiler may wish to get the exact current timestamp. Thus,

static size_t getTimestamp();

is provided. To get maximum precision on every system, there is a platform source that implements this function on both Windows and Linux using platform-specific instrumentation calls.

Threads

The profiler needs to be able to handle multiple parallel threads - we'll have a job system. multithreaded rendering, a UI thread, stuff like that.
However, if we simply use std::thread for these, it's not possible to attach a profiler to this thread (as the thread handle is buried deep inside the stdlib), which is no good.
Thus, we have our own way to manage threads - the Thread class.
Simply implement Thread and provide a Run method, everything else is handled automatically.
A Join method is provided to block main thread waiting for the given thread, and the thread can also Wait() and Notify() to allow for complex parallel dependencies.

Threads are currently only implemented fully on Windows, but the infrastructure I've created should also work on Linux (as pthread is very simple).

The Profiler itself

The profiler has a simple interface; instantiate a Profiler with a name to start a block with that name, and when it goes out of scope, that block ends.

A block is to be represented as an actual rectangle on a graph, with width proportional to the time spent in the block.

There are blocks for all kinds of events; see the EventType enum:


    enum class EventType {
      Begin,
      Color,
      End,
      Frame,
      String,
      Int,
      FiberWait,
      FiberWake,
      ContextSwitch,
      Job,
      GPUBegin,
      GPUEnd,
      Link,
      Pause,
      GPUStats,
      Continue,
      Signal,
      Counter
    };

The Profiler is capable of linking to and profiling GPU processes, via the GPUBegin and Link functions.
Other functions are provided for utility, such as color, string, int, frame.

Profilers are per-thread; any calls to a Profiler function will only apply to the profiler for the thread that called the function, even if you call across threads; always the calling thread is updated.

This allows multiple parallel block diagrams to represent what any number of threads are doing at any given time.

The profiler is currently only implemented for Windows, due to extensively using the Windows thread trace functions.

Closes #19.

TheCurle commented 5 months ago

An example usage of the Profiler counter, for tracking process memory usage:

        const float memory = Platform::GetProcessMemory() / (1024 * 1024);
        static size_t processMemoryCounter = SH::Profiler::MakeCounter("Process Memory (MB) ", 0);
        SH::Profiler::PushCounter(processMemoryCounter, memory);

This must be done on the main thread, and from that point on, the main thread will have a Process Memory usage counter attached. If this block of code is called every frame, the memory usage will be updated every frame, and a graph chart will be produced showing how the memory changes over time.

TheCurle commented 5 months ago

I've pushed the files from the other branch that this PR depends on.