flightlessmango / MangoHud

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. Discord: https://discordapp.com/invite/Gj5YmBb
MIT License
6.15k stars 262 forks source link

Latest mangohud introduces performance regressions #628

Closed stephanlachnit closed 2 years ago

stephanlachnit commented 2 years ago

With the latest version of MangoHud I get some crazy performance issues. It's kinda lagging the application (see the framegraph below), and it is getting worse over time. Affects both Vulkan and OpenGL.

Now: mangohud_latest Before: mangohud_old

Any idea how I can debug this? The trace log (see below) is not very conclusive.

Full debug log:

$ LD_PRELOAD="${LD_PRELOAD}:./libMangoHud.so:./libMangoHud_dlsym.so" glxgears -info
[2021-11-09 21:31:08.680] [MANGOHUD] [info] [config.cpp:114] skipping config: '/usr/bin/MangoHud.conf' [ not found ]
[2021-11-09 21:31:08.680] [MANGOHUD] [info] [config.cpp:114] skipping config: '/home/stephan/.config/MangoHud/glxgears.conf' [ not found ]
[2021-11-09 21:31:08.680] [MANGOHUD] [info] [config.cpp:119] parsing config: '/home/stephan/.config/MangoHud/MangoHud.conf'
[2021-11-09 21:31:08.680] [MANGOHUD] [debug] [logging.cpp:112] Logger constructed!
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:767] Ram:16069544
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:768] Cpu:Intel Core i7-3770K CPU @ 3.50GHz
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:769] Kernel:5.14.0-4-amd64
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:770] Os:Debian GNU/Linux bookworm/sid
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:771] Gpu:
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:772] Driver:4.6 Mesa 21.2.5
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [overlay.cpp:773] CPU Scheduler:schedutil
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [cpu.cpp:423] hwmon: sensor name: asus
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [cpu.cpp:423] hwmon: sensor name: acpitz
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [cpu.cpp:423] hwmon: sensor name: coretemp
[2021-11-09 21:31:08.773] [MANGOHUD] [debug] [cpu.cpp:445] hwmon: using input: /sys/class/hwmon/hwmon3/temp1_input
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:537] hwmon: sensor name: asus
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:537] hwmon: sensor name: acpitz
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:537] hwmon: sensor name: coretemp
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:537] hwmon: sensor name: amdgpu
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:554] powercap: name: core
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:554] powercap: name: 
[2021-11-09 21:31:08.774] [MANGOHUD] [debug] [cpu.cpp:554] powercap: name: package-0
[2021-11-09 21:31:08.775] [MANGOHUD] [debug] [overlay.cpp:606] amdgpu path check: /sys/class/drm/card0-HDMI-A-1/device/vendor
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:606] amdgpu path check: /sys/class/drm/card1-HDMI-A-3/device/vendor
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:606] amdgpu path check: /sys/class/drm/card0-VGA-1/device/vendor
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:606] amdgpu path check: /sys/class/drm/card1-DVI-D-2/device/vendor
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:606] amdgpu path check: /sys/class/drm/card1/device/vendor
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:624] using amdgpu path: /sys/class/drm/card1
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [gpu.cpp:194] ticks: 60, 8333333ns
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [gpu.cpp:194] ticks: 60, 8333333ns
[2021-11-09 21:31:08.776] [MANGOHUD] [debug] [overlay.cpp:638] Using libdrm
[2021-11-09 21:31:08.776] [MANGOHUD] [info] [overlay.cpp:678] Uploading is disabled (permit_upload = 0)
[2021-11-09 21:31:08.822] [MANGOHUD] [info] [imgui_impl_opengl3.cpp:418] GL version: 4.6 
[2021-11-09 21:31:08.880] [MANGOHUD] [debug] [inject_glx.cpp:105] GL ref count: 1
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = AMD Radeon R9 200 Series (HAWAII, DRM 3.42.0, 5.14.0-4-amd64, LLVM 12.0.1)
GL_VERSION    = 4.6 (Compatibility Profile) Mesa 21.2.5
GL_VENDOR     = AMD
GL_EXTENSIONS = GL_ARB_multisample GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_vertex_array GL_EXT_compiled_vertex_array GL_EXT_texture GL_EXT_texture3D GL_IBM_rasterpos_clip GL_ARB_point_parameters GL_EXT_draw_range_elements GL_EXT_packed_pixels GL_EXT_point_parameters GL_EXT_rescale_normal GL_EXT_separate_specular_color GL_EXT_texture_edge_clamp GL_SGIS_generate_mipmap GL_SGIS_texture_border_clamp GL_SGIS_texture_edge_clamp GL_SGIS_texture_lod GL_ARB_framebuffer_sRGB GL_ARB_multitexture GL_EXT_framebuffer_sRGB GL_IBM_multimode_draw_arrays GL_IBM_texture_mirrored_repeat GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_transpose_matrix GL_EXT_blend_func_separate GL_EXT_fog_coord GL_EXT_multi_draw_arrays GL_EXT_secondary_color GL_EXT_texture_env_add GL_EXT_texture_filter_anisotropic GL_EXT_texture_lod_bias GL_INGR_blend_func_separate GL_NV_blend_square GL_NV_light_max_exponent GL_NV_texgen_reflection GL_NV_texture_env_combine4 GL_S3_s3tc GL_SUN_multi_draw_arrays GL_ARB_texture_border_clamp GL_ARB_texture_compression GL_EXT_framebuffer_object GL_EXT_texture_compression_s3tc GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_MESA_window_pos GL_NV_packed_depth_stencil GL_NV_texture_rectangle GL_ARB_depth_texture GL_ARB_occlusion_query GL_ARB_shadow GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_mirrored_repeat GL_ARB_window_pos GL_ATI_fragment_shader GL_EXT_stencil_two_side GL_EXT_texture_cube_map GL_NV_copy_depth_to_color GL_NV_depth_clamp GL_NV_fog_distance GL_NV_half_float GL_APPLE_packed_pixels GL_ARB_draw_buffers GL_ARB_fragment_program GL_ARB_fragment_shader GL_ARB_shader_objects GL_ARB_vertex_program GL_ARB_vertex_shader GL_ATI_draw_buffers GL_ATI_texture_env_combine3 GL_ATI_texture_float GL_EXT_depth_bounds_test GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_MESA_pack_invert GL_NV_primitive_restart GL_ARB_depth_clamp GL_ARB_fragment_program_shadow GL_ARB_half_float_pixel GL_ARB_occlusion_query2 GL_ARB_point_sprite GL_ARB_shading_language_100 GL_ARB_sync GL_ARB_texture_non_power_of_two GL_ARB_vertex_buffer_object GL_ATI_blend_equation_separate GL_EXT_blend_equation_separate GL_OES_read_format GL_ARB_color_buffer_float GL_ARB_pixel_buffer_object GL_ARB_texture_compression_rgtc GL_ARB_texture_float GL_ARB_texture_rectangle GL_ATI_texture_compression_3dc GL_EXT_packed_float GL_EXT_pixel_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_mirror_clamp GL_EXT_texture_rectangle GL_EXT_texture_sRGB GL_EXT_texture_shared_exponent GL_ARB_framebuffer_object GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_packed_depth_stencil GL_ARB_vertex_array_object GL_ATI_separate_stencil GL_ATI_texture_mirror_once GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_texture_array GL_EXT_texture_compression_latc GL_EXT_texture_integer GL_EXT_texture_sRGB_decode GL_EXT_timer_query GL_OES_EGL_image GL_AMD_performance_monitor GL_EXT_texture_buffer_object GL_AMD_texture_texture4 GL_ARB_copy_buffer GL_ARB_depth_buffer_float GL_ARB_draw_instanced GL_ARB_half_float_vertex GL_ARB_instanced_arrays GL_ARB_map_buffer_range GL_ARB_texture_buffer_object GL_ARB_texture_rg GL_ARB_texture_swizzle GL_ARB_vertex_array_bgra GL_EXT_texture_swizzle GL_EXT_vertex_array_bgra GL_NV_conditional_render GL_AMD_conservative_depth GL_AMD_depth_clamp_separate GL_AMD_draw_buffers_blend GL_AMD_seamless_cubemap_per_texture GL_AMD_shader_stencil_export GL_ARB_ES2_compatibility GL_ARB_blend_func_extended GL_ARB_compatibility GL_ARB_debug_output GL_ARB_draw_buffers_blend GL_ARB_draw_elements_base_vertex GL_ARB_explicit_attrib_location GL_ARB_fragment_coord_conventions GL_ARB_provoking_vertex GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_shader_stencil_export GL_ARB_shader_texture_lod GL_ARB_tessellation_shader GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_cube_map_array GL_ARB_texture_gather GL_ARB_texture_multisample GL_ARB_texture_query_lod GL_ARB_texture_rgb10_a2ui GL_ARB_uniform_buffer_object GL_ARB_vertex_type_2_10_10_10_rev GL_ATI_meminfo GL_EXT_provoking_vertex GL_EXT_texture_snorm GL_MESA_texture_signed_rgba GL_NV_copy_image GL_NV_texture_barrier GL_ARB_draw_indirect GL_ARB_get_program_binary GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_robustness GL_ARB_separate_shader_objects GL_ARB_shader_bit_encoding GL_ARB_shader_precision GL_ARB_shader_subroutine GL_ARB_texture_compression_bptc GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_vertex_attrib_64bit GL_ARB_viewport_array GL_EXT_direct_state_access GL_EXT_shader_image_load_store GL_EXT_vertex_attrib_64bit GL_NV_vdpau_interop GL_AMD_multi_draw_indirect GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_ARB_base_instance GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_internalformat_query GL_ARB_map_buffer_alignment GL_ARB_shader_atomic_counters GL_ARB_shader_image_load_store GL_ARB_shading_language_420pack GL_ARB_shading_language_packing GL_ARB_texture_storage GL_ARB_transform_feedback_instanced GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_transform_feedback GL_AMD_query_buffer_object GL_AMD_shader_trinary_minmax GL_AMD_vertex_shader_layer GL_AMD_vertex_shader_viewport_index GL_ARB_ES3_compatibility GL_ARB_arrays_of_arrays GL_ARB_clear_buffer_object GL_ARB_compute_shader GL_ARB_copy_image GL_ARB_explicit_uniform_location GL_ARB_fragment_layer_viewport GL_ARB_framebuffer_no_attachments GL_ARB_invalidate_subdata GL_ARB_multi_draw_indirect GL_ARB_program_interface_query GL_ARB_robust_buffer_access_behavior GL_ARB_shader_image_size GL_ARB_shader_storage_buffer_object GL_ARB_stencil_texturing GL_ARB_texture_buffer_range GL_ARB_texture_query_levels GL_ARB_texture_storage_multisample GL_ARB_texture_view GL_ARB_vertex_attrib_binding GL_KHR_debug GL_KHR_robustness GL_KHR_texture_compression_astc_ldr GL_AMD_pinned_memory GL_ARB_bindless_texture GL_ARB_buffer_storage GL_ARB_clear_texture GL_ARB_compute_variable_group_size GL_ARB_enhanced_layouts GL_ARB_indirect_parameters GL_ARB_internalformat_query2 GL_ARB_multi_bind GL_ARB_query_buffer_object GL_ARB_seamless_cubemap_per_texture GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shading_language_include GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_stencil8 GL_ARB_vertex_type_10f_11f_11f_rev GL_EXT_shader_integer_mix GL_NVX_gpu_memory_info GL_ARB_ES3_1_compatibility GL_ARB_clip_control GL_ARB_conditional_render_inverted GL_ARB_cull_distance GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_get_texture_sub_image GL_ARB_pipeline_statistics_query GL_ARB_shader_texture_image_samples GL_ARB_texture_barrier GL_ARB_transform_feedback_overflow_query GL_EXT_polygon_offset_clamp GL_EXT_shader_image_load_formatted GL_KHR_blend_equation_advanced GL_KHR_context_flush_control GL_KHR_robust_buffer_access_behavior GL_NV_shader_atomic_int64 GL_ARB_gpu_shader_int64 GL_ARB_parallel_shader_compile GL_ARB_shader_atomic_counter_ops GL_ARB_shader_ballot GL_ARB_shader_clock GL_ARB_shader_viewport_layer_array GL_EXT_shader_samples_identical GL_EXT_texture_sRGB_R8 GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_ARB_gl_spirv GL_ARB_spirv_extensions GL_EXT_window_rectangles GL_MESA_shader_integer_functions GL_ARB_polygon_offset_clamp GL_ARB_texture_filter_anisotropic GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_semaphore GL_EXT_semaphore_fd GL_KHR_parallel_shader_compile GL_NV_alpha_to_coverage_dither_control GL_AMD_framebuffer_multisample_advanced GL_EXT_EGL_image_storage GL_EXT_texture_shadow_lod GL_INTEL_blackhole_render GL_MESA_framebuffer_flip_y GL_NV_compute_shader_derivatives GL_EXT_EGL_sync GL_EXT_demote_to_helper_invocation 
VisualID 1372, 0x55c
glsl_version: 410
328 frames in 5.0 seconds = 65.538 FPS
300 frames in 5.0 seconds = 59.998 FPS
300 frames in 5.0 seconds = 59.998 FPS
291 frames in 5.0 seconds = 58.198 FPS
292 frames in 5.1 seconds = 57.656 FPS
208 frames in 5.1 seconds = 40.625 FPS
10 frames in 5.1 seconds =  1.953 FPS
9 frames in 5.1 seconds =  1.758 FPS
10 frames in 5.1 seconds =  1.952 FPS
10 frames in 5.1 seconds =  1.955 FPS
10 frames in 5.1 seconds =  1.952 FPS
10 frames in 5.1 seconds =  1.955 FPS
10 frames in 5.1 seconds =  1.953 FPS
10 frames in 5.1 seconds =  1.952 FPS
10 frames in 5.1 seconds =  1.955 FPS
10 frames in 5.1 seconds =  1.953 FPS
10 frames in 5.1 seconds =  1.952 FPS
10 frames in 5.1 seconds =  1.955 FPS
10 frames in 5.1 seconds =  1.953 FPS
10 frames in 5.1 seconds =  1.953 FPS
10 frames in 5.1 seconds =  1.953 FPS

Meson settings:

$ meson builddir --prefix=/usr -Duse_system_vulkan=enabled -Duse_system_spdlog=enabled -Dwith_nvml=disabled -Dwith_wayland=enabled -Dmangoapp=true -Dwith_libdrm_amdgpu=enabled --buildtype=debug -Dglibcxx_asserts=true
The Meson build system
Version: 0.60.1
Source dir: /home/stephan/Projects/mangohud
Build dir: /home/stephan/Projects/mangohud/builddir
Build type: native build
Project name: MangoHud
Project version: v0.6.6
C compiler for the host machine: ccache cc (gcc 11.2.0 "cc (Debian 11.2.0-10) 11.2.0")
C linker for the host machine: cc ld.bfd 2.37
C++ compiler for the host machine: ccache c++ (gcc 11.2.0 "c++ (Debian 11.2.0-10) 11.2.0")
C++ linker for the host machine: c++ ld.bfd 2.37
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python3 (mako) found: YES (/usr/bin/python3) modules: mako
Program /usr/bin/meson found: YES (/usr/bin/meson)
Checking if "GCC atomic builtins" : compiles: YES 
Checking if "Supports timespec_get" : compiles: YES 
Checking for function "bswap32" : YES 
Checking for function "bswap64" : YES 
Checking for function "clz" : YES 
Checking for function "clzll" : YES 
Checking for function "ctz" : YES 
Checking for function "expect" : YES 
Checking for function "ffs" : YES 
Checking for function "ffsll" : YES 
Checking for function "popcount" : YES 
Checking for function "popcountll" : YES 
Checking for function "unreachable" : YES 
Found pkg-config: /usr/bin/pkg-config (0.29.2)
Run-time dependency x11 found: YES 1.7.2
Run-time dependency wayland-client found: YES 1.19.0
Run-time dependency dbus-1 found: YES 1.12.20
Run-time dependency libdrm found: YES 2.4.107
Run-time dependency vulkan found: YES 1.2.189
Run-time dependency threads found: YES
Compiler for C supports arguments -Werror=implicit-function-declaration: YES 
Compiler for C supports arguments -Werror=missing-prototypes: YES 
Compiler for C supports arguments -Werror=return-type: YES 
Compiler for C supports arguments -Werror=incompatible-pointer-types: YES 
Compiler for C supports arguments -fno-math-errno: YES 
Compiler for C supports arguments -fno-trapping-math: YES 
Compiler for C supports arguments -Qunused-arguments: NO 
Compiler for C supports arguments -Wmissing-field-initializers: YES 
Compiler for C supports arguments -Wformat-truncation: YES 
Compiler for C++ supports arguments -Werror=return-type: YES 
Compiler for C++ supports arguments -fno-math-errno: YES 
Compiler for C++ supports arguments -fno-trapping-math: YES 
Compiler for C++ supports arguments -Qunused-arguments: NO 
Compiler for C++ supports arguments -Wnon-virtual-dtor: YES 
Compiler for C++ supports arguments -Wmissing-field-initializers: YES 
Compiler for C++ supports arguments -Wformat-truncation: YES 
Compiler for C supports arguments -Woverride-init: YES 
Compiler for C supports arguments -Winitializer-overrides: NO 
Checking for function "dlopen" : NO 
Library dl found: YES
Checking for function "clock_gettime" : YES 
Checking for size of "void*" : 8

Executing subproject imgui 

imgui| Project name: imgui
imgui| Project version: 1.85
imgui| C++ compiler for the host machine: ccache c++ (gcc 11.2.0 "c++ (Debian 11.2.0-10) 11.2.0")
imgui| C++ linker for the host machine: c++ ld.bfd 2.37
imgui| Library d3d9 skipped: feature dx9 disabled
imgui| Library d3d10 skipped: feature dx10 disabled
imgui| Library d3d11 skipped: feature dx11 disabled
imgui| Library d3d12 skipped: feature dx12 disabled
imgui| Dependency appleframeworks (modules: metal) skipped: feature metal disabled
imgui| Library dl found: YES
imgui| Dependency sdl2 skipped: feature sdl_renderer disabled
imgui| Dependency vulkan skipped: feature vulkan disabled
imgui| Has header "webgpu/webgpu.h" skipped: feature webgpu disabled
imgui| Run-time dependency glfw3 found: YES 3.3.2
imgui| Dependency sdl2 skipped: feature sdl2 disabled
imgui| Dependency allegro-5 skipped: feature allegro5 disabled
imgui| Dependency allegro_primitives-5 skipped: feature allegro5 disabled
imgui| Library marmalade skipped: feature marmalade disabled
imgui| Build targets in project: 3
imgui| Subproject imgui finished.

Library spdlog found: YES
Run-time dependency spdlog found: YES 1.8.5
Dependency glfw3 found: YES 3.3.2 (cached)
Program glslangValidator found: YES (/usr/bin/glslangValidator)
Has header "NVCtrl/NVCtrl.h" : YES 
Compiler for C supports link arguments -Wl,-Bsymbolic-functions: YES 
Compiler for C supports link arguments -Wl,-z,relro: YES 
Compiler for C supports link arguments -Wl,--exclude-libs,ALL: YES 
Compiler for C supports link arguments -lGL: YES 
Configuring MangoHud.json using configuration
Configuring mangohud using configuration
Build targets in project: 8

MangoHud v0.6.6

  Subprojects
    imgui             : YES

  User defined options
    buildtype         : debug
    prefix            : /usr
    glibcxx_asserts   : true
    mangoapp          : true
    use_system_spdlog : enabled
    use_system_vulkan : enabled
    with_libdrm_amdgpu: enabled
    with_nvml         : disabled
    with_wayland      : enabled

Found ninja-1.10.1 at /usr/bin/ninja
stephanlachnit commented 2 years ago

Using git bisect, I was able to found the commit that introduced the regression: abf146f73c8baf79cfd3e2d25b02beaf750e1116

abf146f73c8baf79cfd3e2d25b02beaf750e1116 is the first bad commit
commit abf146f73c8baf79cfd3e2d25b02beaf750e1116
Author: jackun <jack.un@gmail.com>
Date:   Sat Oct 2 16:42:37 2021 +0300

    Set correct swapchain_stats etc references for hw updater

 src/overlay.cpp | 44 +++++++++++++++++++++++++-------------------
 src/overlay.h   |  1 +
 2 files changed, 26 insertions(+), 19 deletions(-)

Looks like a locking issues, I'll dig a little bit deeper.

stephanlachnit commented 2 years ago

It looks like every run/update/update_hw_info call blocks the entire process:

[2021-11-09 22:31:03.941] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:04.410] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:04.410] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:04.933] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:05.412] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:05.412] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:05.925] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:06.412] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:06.412] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:06.917] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:07.413] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:07.413] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:07.941] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:08.415] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:08.415] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
5 frames in 5.0 seconds =  0.999 FPS
[2021-11-09 22:31:08.933] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:09.417] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:09.417] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:09.925] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:10.419] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:10.419] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:10.953] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:11.420] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:11.420] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:11.941] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:12.421] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:12.421] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
[2021-11-09 22:31:12.933] [MANGOHUD] [debug] [overlay.cpp:140] run called
[2021-11-09 22:31:13.422] [MANGOHUD] [debug] [overlay.cpp:134] update called
[2021-11-09 22:31:13.422] [MANGOHUD] [debug] [overlay.cpp:49] update_hw_info called
5 frames in 5.0 seconds =  0.998 FPS

That's exactly 1 frame for each call.

stephanlachnit commented 2 years ago

Ok I think I figured it out. The problem is that std::unique_lock<std::mutex> lk(m_hw_updating); is called both in update() and run(). However to my understanding this is not necessary: one only needs to lock when the actual updating is done, not when calling update(). As update only notifies to listeners, nothing happens if run() is still updating. And indeed, when commenting out the lock in update() everything works fine.

Here is a log with the lock in update() removed:

[2021-11-09 22:57:03.581] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:03.581] [MANGOHUD] [debug] [overlay.cpp:139] run: calling cv_hwupdate.wait()
[2021-11-09 22:57:03.581] [MANGOHUD] [debug] [overlay.cpp:145] run: creating unique_lock with mutex m_hw_updating
[2021-11-09 22:57:03.581] [MANGOHUD] [debug] [overlay.cpp:147] run: calling update_hw_info()
[2021-11-09 22:57:04.087] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:04.105] [MANGOHUD] [debug] [overlay.cpp:149] run: update_hw_info() returned
[2021-11-09 22:57:04.105] [MANGOHUD] [debug] [overlay.cpp:139] run: calling cv_hwupdate.wait()
[2021-11-09 22:57:04.603] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:04.603] [MANGOHUD] [debug] [overlay.cpp:145] run: creating unique_lock with mutex m_hw_updating
[2021-11-09 22:57:04.603] [MANGOHUD] [debug] [overlay.cpp:147] run: calling update_hw_info()
[2021-11-09 22:57:05.104] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:05.125] [MANGOHUD] [debug] [overlay.cpp:149] run: update_hw_info() returned
[2021-11-09 22:57:05.125] [MANGOHUD] [debug] [overlay.cpp:139] run: calling cv_hwupdate.wait()
[2021-11-09 22:57:05.620] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:05.620] [MANGOHUD] [debug] [overlay.cpp:145] run: creating unique_lock with mutex m_hw_updating
[2021-11-09 22:57:05.620] [MANGOHUD] [debug] [overlay.cpp:147] run: calling update_hw_info()
[2021-11-09 22:57:06.120] [MANGOHUD] [debug] [overlay.cpp:127] update: called
[2021-11-09 22:57:06.153] [MANGOHUD] [debug] [overlay.cpp:149] run: update_hw_info() returned

As one can see, update is called more often than update_hw_info() returns. With locks in both places this creates a delay in update() when update_hw_info() is not finished yet.

stephanlachnit commented 2 years ago

Ah right I see why you want a lock there, sw_stats, params, vendorID and update_hw_info_thread shouldn't be changed while hw_info updates. The proper solution is to skip update() if the mutex is locked.

jackun commented 2 years ago

Weird, don't see it even with 1220 fps (well, depends more on fps_sampling_period etc). But the whole thing is kind of going to blow up any moment now

stephanlachnit commented 2 years ago

Weird, don't see it even with 1220 fps. But the whole thing is kind of going to blow up any moment now

I guess it depends on how long your update_hw_info() takes. If it is fast enough than you won't see it. Maybe measure the duration to check if this is the case.