KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.76k stars 419 forks source link

Request: Add previous field to MVKPerformanceTracker struct #2177

Closed SRSaunders closed 6 months ago

SRSaunders commented 7 months ago

I would like to make a small feature request for the MoltenVK performance tracker: Add a .previous field to the MVKPerformanceTracker struct.

I have run into a situation where I need both the previous and latest stats when submitting multiple (2 in my case) command buffers per frame: the first for game rendering, and the second for the embedded Optick profiler agent within RBDoom3BFG. When the profiler agent is active, it is the last command buffer processed for the frame and unfortunately this wipes out the more valuable .latest information for actual game rendering. I have been using the .latest field for displaying MoltenVK's encoder time in an on-screen HUD when the profiler is inactive (i.e. only 1 command buffer per frame), but I would also like to show encode time in the profiler trace when the profiler agent is active. Unfortunately I cannot seem to capture the .latest information for the rendering command buffer alone since both buffers sync on the same end-of-frame event.

I have prototyped a simple potential solution as follows:

typedef struct {
    uint32_t count;       /**< The number of activities of this type. */
    double latest;        /**< The latest (most recent) value of the activity. */
    double average;       /**< The average value of the activity. */
    double minimum;       /**< The minimum value of the activity. */
    double maximum;       /**< The maximum value of the activity. */
    double previous;      /**< The previous (second most recent) value of the activity. */
} MVKPerformanceTracker;
void MVKDevice::updateActivityPerformance(MVKPerformanceTracker& activity, double currentValue) {
    lock_guard<mutex> lock(_perfLock);

    activity.previous = activity.latest;
    activity.latest = currentValue;
    activity.minimum = ((activity.minimum == 0.0)
                                ? currentValue :
                                min(currentValue, activity.minimum));
       ...

This gives a very nice result (see trace labeled _MetalEncode) in the Optick Profiler traces below. Note by doing this I have learned quite a bit about timing and duration of the encoding step, and how this influences frame rate. For instance, the two samples below are for an identical game scene, but the second is much more performant (~4x frame rate) due the application of Vulkan push constants vs. using constant buffers for shader parameters. Both encoding time and GPU rendering time are drastically reduced when using push constants vs. constant buffers - and having visibility in the Optick profiler makes that very clear.

I am not sure if you would consider this proposed performance tracker change to be general enough, but given it's simplicity I thought I would ask.

(FYI - the game with profiler agent is running on macOS, but the profiler display client is Windows-only, thus I am using Parallels for that)

Screenshot 2024-03-07 at 7 41 51 PM

Screenshot 2024-03-04 at 11 39 04 PM

billhollings commented 6 months ago

I am not sure if you would consider this proposed performance tracker change to be general enough, but given it's simplicity I thought I would ask.

Yup. This looks good, and we can definitely add it.

Since you've already got it working, do you want to submit a PR?

If you do, I suggest moving previous to between latest and average, to visually couple previous to latest, as a way to help others understand the structure.

SRSaunders commented 6 months ago

Thanks very much and I'd be happy to submit a PR. Ironically, when I first made the change in my private build, I did place the previous member between latest and average, but changed it later due to this comment:

 * MoltenVK performance. You can retrieve a copy of this structure using the vkGetPerformanceStatisticsMVK() function.
 *
 * This structure may be extended as new features are added to MoltenVK. If you are linking to
 * an implementation of MoltenVK that was compiled from a different MVK_PRIVATE_API_VERSION
 * than your app was, the size of this structure in your app may be larger or smaller than the
 * struct in MoltenVK. See the description of the vkGetPerformanceStatisticsMVK() function for
 * information about how to handle this.
 *
 * TO SUPPORT DYNAMIC LINKING TO THIS STRUCTURE AS DESCRIBED ABOVE, THIS STRUCTURE SHOULD NOT
 * BE CHANGED EXCEPT TO ADD ADDITIONAL MEMBERS ON THE END. EXISTING MEMBERS, AND THEIR ORDER,
 * SHOULD NOT BE CHANGED.

Is this something you want to respect for the MVKPerformanceTracker struct, which is then combined into other structs that eventually culminate in MVKPerformanceStatistics? Or would you prefer the more logical grouped organization you suggest above?

In general, I wonder whether you should define an expected size in mvk_private_api.h for current and future versions of MVKPerformanceStatistics to enable runtime checking as per above. And for each change in the structs, a new expected revision and size would be added to the header file. This is an approach that is sometimes used for system files that add additional fields over time (e.g. mach/task.h with TASK_VM_INFO_REVX_COUNT). Or would you prefer to take a different approach, and instead encourage users to use MVK_PRIVATE_API_VERSION to check for runtime compatibility regarding new MVKPerformanceStatistics and related struct entries? Can MVK_PRIVATE_API_VERSION be queried at runtime? If you pick the latter (and if runtime query support is available), then my proposed change should bump the MVK_PRIVATE_API_VERSION. What approach would you like submitted with my PR?

billhollings commented 6 months ago

due to this comment

Ah. Of course.

The uncapitalized section of that comment is valid, to avoid buffer-overflow types of crashes. But the capitalized content ("TO SUPPORT...") is actually misguided, because these are nested structures, and the act of adding previous is going to add content all throughout the middle of the overall MVKPerformanceStatistics structure, regardless of where it appears in MVKPerformanceTracker. That capitalized requirement has meaning for linear API structures (like MVKPhysicalDeviceMetalFeatures or MVKConfiguration).

So, please go ahead with adding previous between latest and average, and also please remove the capitalized ("TO SUPPORT...") comment for MVKPerformanceStatistics.

Sorry for the confusion with that comment.

SRSaunders commented 6 months ago

Thanks for accommodating this enhancement and merging my PR into main. I have tested it and all is working well.