KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.79k stars 423 forks source link

Possible perfomance / time measurement regression #1996

Closed m154k1 closed 1 year ago

m154k1 commented 1 year ago

Hi, I'm facing a possible regression in dd31587.

I'm using mpv media player which uses libplacebo as Vulkan rendering backend. The player can measure its performance via "Frame timings". Each value describes how much time was spent to render each step (upscaling, etc.). The time unit is microsecond.

So here are frame timings before dd31587:

screenshot1

And after dd31587:

screenshot2

You can see that values are increased by roughly 40x. There are 3 possible variants:

  1. Performance actually regressed by 40x (unlikely)
  2. Reported timings are wrong
  3. New values are correct and old ones are not (also unlikely)

Unfortunately, I don't know how the time measurement is implemented but dd31587 somehow affects it. Currently I reverted the commit and everything is back to normal.

System info: MacBook Air M2 macOS 13.5

mvk-info ``` [mvk-info] MoltenVK version 1.2.5, supporting Vulkan version 1.2.261. The following 99 Vulkan extensions are supported: VK_KHR_16bit_storage v1 VK_KHR_8bit_storage v1 VK_KHR_bind_memory2 v1 VK_KHR_buffer_device_address v1 VK_KHR_copy_commands2 v1 VK_KHR_create_renderpass2 v1 VK_KHR_dedicated_allocation v3 VK_KHR_deferred_host_operations v4 VK_KHR_depth_stencil_resolve v1 VK_KHR_descriptor_update_template v1 VK_KHR_device_group v4 VK_KHR_device_group_creation v1 VK_KHR_driver_properties v1 VK_KHR_dynamic_rendering v1 VK_KHR_external_fence v1 VK_KHR_external_fence_capabilities v1 VK_KHR_external_memory v1 VK_KHR_external_memory_capabilities v1 VK_KHR_external_semaphore v1 VK_KHR_external_semaphore_capabilities v1 VK_KHR_fragment_shader_barycentric v1 VK_KHR_get_memory_requirements2 v1 VK_KHR_get_physical_device_properties2 v2 VK_KHR_get_surface_capabilities2 v1 VK_KHR_imageless_framebuffer v1 VK_KHR_image_format_list v1 VK_KHR_incremental_present v2 VK_KHR_maintenance1 v2 VK_KHR_maintenance2 v1 VK_KHR_maintenance3 v1 VK_KHR_map_memory2 v1 VK_KHR_multiview v1 VK_KHR_portability_subset v1 VK_KHR_push_descriptor v2 VK_KHR_relaxed_block_layout v1 VK_KHR_sampler_mirror_clamp_to_edge v3 VK_KHR_sampler_ycbcr_conversion v14 VK_KHR_separate_depth_stencil_layouts v1 VK_KHR_shader_draw_parameters v1 VK_KHR_shader_float_controls v4 VK_KHR_shader_float16_int8 v1 VK_KHR_shader_non_semantic_info v1 VK_KHR_shader_subgroup_extended_types v1 VK_KHR_spirv_1_4 v1 VK_KHR_storage_buffer_storage_class v1 VK_KHR_surface v25 VK_KHR_swapchain v70 VK_KHR_swapchain_mutable_format v1 VK_KHR_timeline_semaphore v2 VK_KHR_uniform_buffer_standard_layout v1 VK_KHR_variable_pointers v1 VK_EXT_4444_formats v1 VK_EXT_buffer_device_address v2 VK_EXT_calibrated_timestamps v2 VK_EXT_debug_marker v4 VK_EXT_debug_report v10 VK_EXT_debug_utils v2 VK_EXT_descriptor_indexing v2 VK_EXT_external_memory_host v1 VK_EXT_fragment_shader_interlock v1 VK_EXT_hdr_metadata v2 VK_EXT_host_query_reset v1 VK_EXT_image_robustness v1 VK_EXT_inline_uniform_block v1 VK_EXT_memory_budget v1 VK_EXT_metal_objects v1 VK_EXT_metal_surface v1 VK_EXT_pipeline_creation_cache_control v3 VK_EXT_pipeline_creation_feedback v1 VK_EXT_post_depth_coverage v1 VK_EXT_private_data v1 VK_EXT_robustness2 v1 VK_EXT_sample_locations v1 VK_EXT_scalar_block_layout v1 VK_EXT_separate_stencil_usage v1 VK_EXT_shader_atomic_float v1 VK_EXT_shader_demote_to_helper_invocation v1 VK_EXT_shader_stencil_export v1 VK_EXT_shader_subgroup_ballot v1 VK_EXT_shader_subgroup_vote v1 VK_EXT_shader_viewport_index_layer v1 VK_EXT_subgroup_size_control v2 VK_EXT_surface_maintenance1 v1 VK_EXT_swapchain_colorspace v4 VK_EXT_swapchain_maintenance1 v1 VK_EXT_texel_buffer_alignment v1 VK_EXT_texture_compression_astc_hdr v1 VK_EXT_vertex_attribute_divisor v3 VK_AMD_gpu_shader_half_float v2 VK_AMD_negative_viewport_height v1 VK_AMD_shader_image_load_store_lod v1 VK_AMD_shader_trinary_minmax v1 VK_IMG_format_pvrtc v1 VK_INTEL_shader_integer_functions2 v1 VK_GOOGLE_display_timing v1 VK_MVK_macos_surface v3 VK_MVK_moltenvk v37 VK_NV_fragment_shader_barycentric v1 VK_NV_glsl_shader v1 [mvk-info] GPU device: model: Apple M2 type: Integrated vendorID: 0x106b deviceID: 0xd0503f0 pipelineCacheUUID: B3C9F867-0D05-03F0-0000-000000000000 supports the following Metal Versions, GPU's and Feature Sets: Metal Shading Language 3.0 GPU Family Apple 8 GPU Family Apple 7 GPU Family Apple 6 GPU Family Apple 5 GPU Family Apple 4 GPU Family Apple 3 GPU Family Apple 2 GPU Family Apple 1 GPU Family Mac 2 GPU Family Mac 1 GPU Family Common 3 GPU Family Common 2 GPU Family Common 1 macOS GPU Family 2 v1 macOS GPU Family 1 v4 macOS GPU Family 1 v3 macOS GPU Family 1 v2 macOS GPU Family 1 v1 [mvk-info] Created VkInstance for Vulkan version 1.2.261, as requested by app, with the following 7 Vulkan extensions enabled: VK_KHR_external_memory_capabilities v1 VK_KHR_external_semaphore_capabilities v1 VK_KHR_get_physical_device_properties2 v2 VK_KHR_get_surface_capabilities2 v1 VK_KHR_surface v25 VK_EXT_metal_surface v1 VK_EXT_swapchain_colorspace v4 [mvk-info] Vulkan semaphores using MTLEvent. [mvk-info] Created VkDevice to run on GPU Apple M2 with the following 7 Vulkan extensions enabled: VK_KHR_portability_subset v1 VK_KHR_push_descriptor v2 VK_KHR_swapchain v70 VK_EXT_external_memory_host v1 VK_EXT_hdr_metadata v2 VK_EXT_metal_objects v1 VK_EXT_shader_atomic_float v1 ```
billhollings commented 1 year ago

The change in dd31587 should not have affected Apple Silicon performance at all, so your M2 should be operating the same as it did, and even on non-AS devices, performance was improved by dd31587.

How are you measuring the microsecond durations, and how are the values calculated?

On Apple Silicon, the value of VkPhysicalDeviceLimits::timestampPeriod changed between the before and after dd31587. It was previously 1.0 (by mistake), and is now 41.667, so if you are using timestampPeriod to calculate the microsecond values from measured GPU ticks, those values will have jumped by 40x, as presumably you are finding.

BTW...you can validate actual performance by enabling Metal's HUD.

m154k1 commented 1 year ago

How are you measuring the microsecond durations, and how are the values calculated?

On Apple Silicon, the value of VkPhysicalDeviceLimits::timestampPeriod changed between the before and after dd31587. It was previously 1.0 (by mistake), and is now 41.667, so if you are using timestampPeriod to calculate the microsecond values from measured GPU ticks, those values will have jumped by 40x, as presumably you are finding.

Looks like it does use timestampPeriod: https://code.videolan.org/videolan/libplacebo/-/blob/master/src/vulkan/gpu.c#L99

Well, so the current behavior is correct actually. Thank you for clarification :)