flightlessmango / MangoHud

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. Discord: https://discordapp.com/invite/Gj5YmBb
MIT License
6.33k stars 278 forks source link

0.7.1 malloc(): unsorted double linked list corrupted #1230

Closed JupiterRider closed 3 months ago

JupiterRider commented 7 months ago

Describe the bug malloc(): unsorted double linked list corrupted

List relevant hardware/software information

To Reproduce Steps to reproduce the behavior: Run an application with the --dlsym parameter. For example supertux which links sdl2: mangohud --dlsym supertux2

Additional context When downgrading to 0.7.0 this error doesn't happen.

flightlessmango commented 7 months ago

Does it happen without a config file?

JupiterRider commented 7 months ago

@flightlessmango This happens with and without config file.

flightlessmango commented 5 months ago

Works fine for me image

flightlessmango commented 5 months ago

Assuming this has been resolved since 0.7.1 unless more information is provided

Torston420 commented 5 months ago

Bug specifically affects apps that require --dlsym to hook, seems intel specific bisected to commit 9411963ad907738f24a8286c0fee6e7f8eccb284 still present on latest commit I am on Arc A770

flightlessmango commented 5 months ago

@Torston420 It would be great if you could get a gdb backtrace since I can't repro it locally

Torston420 commented 5 months ago
#0  0x00007ffff68ab32c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff685a6c8 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff68424b8 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff6843395 in ?? () from /usr/lib/libc.so.6
#4  0x00007ffff68b52a7 in ?? () from /usr/lib/libc.so.6
#5  0x00007ffff68b87ec in ?? () from /usr/lib/libc.so.6
#6  0x00007ffff68b97ed in malloc () from /usr/lib/libc.so.6
#7  0x00007ffff6893f1a in _IO_file_doallocate () from /usr/lib/libc.so.6
#8  0x00007ffff68a2c89 in _IO_doallocbuf () from /usr/lib/libc.so.6
#9  0x00007ffff68a0a65 in _IO_file_underflow () from /usr/lib/libc.so.6
#10 0x00007ffff68a2d2f in _IO_default_uflow () from /usr/lib/libc.so.6
#11 0x00007ffff6895abb in _IO_getline_info () from /usr/lib/libc.so.6
#12 0x00007ffff6894860 in fgets () from /usr/lib/libc.so.6
#13 0x00007ffff790c5df in Intel::intel_gpu_thread() () from /usr/local/lib/mangohud/libMangoHud_opengl.so
#14 0x00007ffff7c99703 in execute_native_thread_routine ()
   from /usr/local/lib/mangohud/libMangoHud_opengl.so
#15 0x00007ffff68a955a in ?? () from /usr/lib/libc.so.6
#16 0x00007ffff6926a3c in ?? () from /usr/lib/libc.so.6
flightlessmango commented 5 months ago

@Torston420 can you try this patch?

diff --git a/src/intel.cpp b/src/intel.cpp
index d61fa56..b010344 100644
--- a/src/intel.cpp
+++ b/src/intel.cpp
@@ -10,6 +10,9 @@ void Intel::intel_gpu_thread(){
     else
         intel_gpu_top = popen("intel_gpu_top -J -s 500", "r");

+    if (intel_gpu_top == NULL)
+        SPDLOG_INFO("popen failed on '{}'", "intel_gpu_top");
+
     int num_line = 0;
     std::string buf;
     int num_iterations = 0;
Torston420 commented 5 months ago
#13 0x00007ffff790c5cf in Intel::intel_gpu_thread() () from /usr/local/lib/mangohud/libMangoHud_opengl.so
#14 0x00007ffff7c99753 in execute_native_thread_routine ()
   from /usr/local/lib/mangohud/libMangoHud_opengl.so

otherwise no change

flightlessmango commented 5 months ago

This patch adds some more verbose output to exit code of intel_gpu_top Can run this and see what it returns please?

index d61fa56..b08e27a 100644
--- a/src/intel.cpp
+++ b/src/intel.cpp
@@ -54,16 +54,21 @@ void Intel::intel_gpu_thread(){
             break;
     }

-    int exitcode = pclose(intel_gpu_top) / 256;
-    if (exitcode > 0){
-        if (exitcode == 127)
+    int status = pclose(intel_gpu_top);
+    if (WIFEXITED(status)) {
+        int exitcode = WEXITSTATUS(status);
+        if (exitcode != 0) {
+            SPDLOG_INFO("intel_gpu_top exited with status {}", exitcode);
+            if (exitcode == 127) {
         SPDLOG_INFO("Failed to open '{}'", "intel_gpu_top");
-
-        if (exitcode == 1)
+            } else if (exitcode == 1) {
         SPDLOG_INFO("Missing permissions for '{}'", "intel_gpu_top");
-
+            }
         SPDLOG_INFO("Disabling gpu_stats");
         HUDElements.params->enabled[OVERLAY_PARAM_ENABLED_gpu_stats] = false;
+        }
+    } else if (WIFSIGNALED(status)) {
+        SPDLOG_INFO("intel_gpu_top killed by signal {}", WTERMSIG(status));
     }
 }
Torston420 commented 5 months ago
malloc(): unsorted double linked list corrupted
fish: Job 1, 'mangohud --dlsym openrct2' terminated by signal SIGABRT (Abort)
#13 0x00007ffff7911baf in Intel::intel_gpu_thread() () from /usr/local/lib/mangohud/libMangoHud_opengl.so
#14 0x00007ffff7ca7863 in execute_native_thread_routine ()
   from /usr/local/lib/mangohud/libMangoHud_opengl.so
Torston420 commented 5 months ago

openrct2 is the app I'm using for these tests if I use gzdoom w/ OpenGL, I get an app hang instead, killing causes SIGABRT with the same error leaving a backtrace here since its the same offending commit, hope this helps in some way

#0  0x00007ffff5aa5ebe in ??? () at /usr/lib/libc.so.6
#1  0x00007ffff5aab0e3 in ??? () at /usr/lib/libc.so.6
#2  0x00007ffff7ca78d8 in std::thread::join() () at /usr/local/lib/mangohud/libMangoHud_opengl.so
#3  0x00007ffff788bccb in init_gpu_stats(unsigned int&, unsigned int, overlay_params&) ()
    at /usr/local/lib/mangohud/libMangoHud_opengl.so
#4  0x00007ffff785e684 in MangoHud::GL::imgui_create(void*, MangoHud::GL::gl_wsi) ()
    at /usr/local/lib/mangohud/libMangoHud_opengl.so
#5  0x00007ffff786669d in glXMakeCurrent () at /usr/local/lib/mangohud/libMangoHud_opengl.so
#6  0x00007ffff6bd23e1 in ??? () at /usr/lib/libSDL2-2.0.so.0
#7  0x00007ffff6bd5343 in ??? () at /usr/lib/libSDL2-2.0.so.0
#8  0x00007ffff6bac4a0 in ??? () at /usr/lib/libSDL2-2.0.so.0
#9  0x0000555555681e85 in ??? ()
#10 0x0000555555e0e217 in ??? ()
#11 0x0000555555682bae in ??? ()
#12 0x0000555555dea088 in ??? ()
#13 0x0000555555925e8c in ??? ()
#14 0x0000555555926d15 in ??? ()
#15 0x0000555555928481 in ??? ()
#16 0x0000555555651e25 in ??? ()
#17 0x00007ffff5a43cd0 in ??? () at /usr/lib/libc.so.6
#18 0x00007ffff5a43d8a in __libc_start_main () at /usr/lib/libc.so.6
#19 0x000055555567b315 in ??? ()
JupiterRider commented 3 months ago

Hey @flightlessmango , is this issue going to be fixed? I just than tell you that it only happens on a Intel GPU together with SDL2 games.

Thanks!

flightlessmango commented 3 months ago

This commit should resolve it 4cbcec30b8f302fecfe3c5ef95f1d0861a649a5f But this comes at the expense of all metrics except load. We'll just have to wait for intel to expose these metrics in sysfs

JupiterRider commented 3 months ago

Thanks! This commit works.