aristocratos / btop

A monitor of resources
Apache License 2.0
21.38k stars 656 forks source link

[BUG] btop 1.3.2 on arch gets sigsegv #764

Open timlag1305 opened 9 months ago

timlag1305 commented 9 months ago

Read the README.md and search for similar issues before posting a bug report!

Any bug that can be solved by just reading the prerequisites section of the README will likely be ignored.

Describe the bug

BTOP is crashing with the latest version 1.3.2. I suspect this is because rocm doesn't fully support my GPU (RX Vega 56)

To Reproduce

Open btop and wait for it to crash.

Expected behavior

Not crash

Screenshots

[If applicable, add screenshots to help explain your problem.]

Info (please complete the following information):

Additional context

Contents of ~/.config/btop/btop.log

2024/02/12 (12:26:41) | ===> btop++ v.1.3.2
2024/02/12 (12:26:41) | DEBUG: Running in DEBUG mode!
2024/02/12 (12:26:41) | INFO: Logger set to DEBUG
2024/02/12 (12:26:41) | DEBUG: Using locale en_US.UTF-8
2024/02/12 (12:26:41) | INFO: Running on /dev/pts/0
2024/02/12 (12:26:41) | INFO: Failed to load libnvidia-ml.so, NVIDIA GPUs will not be detected: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
2024/02/12 (12:26:41) | WARNING: ROCm SMI: Failed to get maximum GPU temperature, defaulting to 110°C
2024/02/12 (12:26:41) | WARNING: ROCm SMI: Failed to get VRAM utilization
2024/02/12 (12:26:41) | WARNING: ROCm SMI: Failed to get GPU power usage
2024/02/12 (12:26:42) | WARNING: ROCm SMI: Failed to get maximum GPU power draw, defaulting to 225W
2024/02/12 (12:26:42) | WARNING: ROCm SMI: Failed to get maximum GPU temperature, defaulting to 110°C
2024/02/12 (12:26:42) | WARNING: ROCm SMI: Failed to get VRAM utilization
2024/02/12 (12:26:42) | WARNING: ROCm SMI: Failed to get GPU power usage
2024/02/12 (12:26:42) | WARNING: ROCm SMI: Failed to get PCIe throughput
2024/02/12 (12:26:42) | DEBUG: Shared::init() : Initialized.
2024/02/12 (12:26:48) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:26:54) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:00) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:06) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:12) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:18) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:24) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:30) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:36) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:42) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:48) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:27:54) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:00) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:06) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:12) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:18) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:24) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:30) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:36) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:42) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:48) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:28:54) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:00) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:06) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:12) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:18) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:24) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:30) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:36) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:42) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:48) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:29:54) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:00) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:06) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:12) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:18) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:24) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:30) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:36) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:42) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:47) | ERROR: Stall in Runner thread, restarting!
2024/02/12 (12:30:53) | ERROR: Stall in Runner thread, restarting!

Note: The snap uses: ~/snap/btop/current/.config/btop

(try running btop with --debug flag if btop.log is empty)

GDB Backtrace

If btop++ is crashing at start the following steps could be helpful:

(Extra helpful if compiled with make OPTFLAGS="-O0 -g")

  1. run (linux): gdb btop (macos): lldb btop

  2. r to run, wait for crash and press enter if prompted, CTRL+L to clear screen if needed.

  3. (gdb): thread apply all bt (lldb): bt all to get backtrace for all threads

  4. Copy and paste the backtrace here:

           PID: 5542 (btop)
           UID: 1000 (tim)
           GID: 998 (wheel)
        Signal: 11 (SEGV)
     Timestamp: Mon 2024-02-12 12:09:13 PST (13min ago)
  Command Line: btop
    Executable: /usr/bin/btop
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (tim)
       Boot ID: c61f4d9e2f014df59fc7d82337e8738c
    Machine ID: b7c0180d1a5f4a46a26ccbf40f40ec25
      Hostname: arch
       Storage: /var/lib/systemd/coredump/core.btop.1000.c61f4d9e2f014df59fc7d82337e8738c.5542.1707768553000000.zst (present)
  Size on Disk: 364.6K
       Message: Process 5542 (btop) of user 1000 dumped core.

                Stack trace of thread 5542:
                #0  0x00005c6960ea009a n/a (btop + 0x9a09a)
                #1  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #2  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #3  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #4  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #5  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #6  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #7  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #8  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #9  0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #10 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #11 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #12 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #13 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #14 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #15 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #16 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #17 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #18 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #19 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #20 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #21 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #22 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #23 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #24 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #25 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #26 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #27 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #28 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #29 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #30 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #31 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #32 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #33 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #34 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #35 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #36 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #37 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #38 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #39 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #40 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #41 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #42 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #43 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #44 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #45 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #46 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #47 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #48 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #49 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #50 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #51 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #52 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #53 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #54 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #55 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #56 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #57 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #58 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #59 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #60 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #61 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #62 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)
                #63 0x00005c6960ea04b0 n/a (btop + 0x9a4b0)

                Stack trace of thread 5545:
                #0  0x00007d20d71196bc read (libc.so.6 + 0xfb6bc)
                #1  0x00007d20d72d56e2 read (libstdc++.so.6 + 0xd56e2)
                #2  0x00007d20d731d759 _ZNSt13basic_filebufIcSt11char_traitsIcEE9underflowEv (libstdc++.so.6 + 0x11d759)
                #3  0x00007d20d72cc71a _ZNSt15basic_streambufIcSt11char_traitsIcEE5sgetcEv (libstdc++.so.6 + 0xcc71a)
                #4  0x00007d20d69b9b23 _ZN3amd3smi6Device15readDevInfoLineENS0_12DevInfoTypesEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (librocm_smi64.so + 0x1b9b23)
                #5  0x00007d20d6adf7f0 n/a (librocm_smi64.so + 0x2df7f0)
                #6  0x00007d20d6af02de rsmi_dev_pci_throughput_get (librocm_smi64.so + 0x2f02de)
                #7  0x00005c6960ee3cae n/a (btop + 0xddcae)
                #8  0x00005c6960e4784f n/a (btop + 0x4184f)
                #9  0x00007d20d70a955a n/a (libc.so.6 + 0x8b55a)
                #10 0x00007d20d7126a3c n/a (libc.so.6 + 0x108a3c)
                ELF object binary architecture: AMD x86-64
timlag1305 commented 9 months ago

Here is a snippet of the backtrace with some of the symbols. The Input::process is repeated like 10,000 times.

#0  0x00005c6960ea009a in std::unordered_map<std::basic_string_view<char, std::char_traits<char> >, bool, std::hash<std::basic_string_view<char, std::char_traits<char> > >, std::equal_to<std::basic_string_view<char, std::char_traits<char> > >, std::allocator<std::pair<std::basic_string_view<char, std::char_traits<char> > const, bool> > >::at (__k=..., this=<optimized out>, this=<optimized out>, __k=...)
    at /usr/include/c++/13.2.1/bits/unordered_map.h:1004
No locals.
#1  Config::getB (name=..., name=...) at src/btop_config.hpp:80
No locals.
#2  Input::process (key="down") at src/btop_input.cpp:209
        filtering = <optimized out>
        vim_keys = <optimized out>
        help_key = <optimized out>
        kill_key = <optimized out>
        boxes = {_M_elems = {"gpu5", "cpu", "mem", "net", "proc", "gpu0", "gpu1", "gpu2", "gpu3", "gpu4"}}
        last_press = 0
#3  0x00005c6960ea04b0 in Input::process (key="down") at src/btop_input.cpp:282
        keep_going = false
        no_update = true
        redraw = true
        filtering = <optimized out>
        vim_keys = <optimized out>
        help_key = <optimized out>
        kill_key = 0x5c6960f381cb "k"
        boxes = {_M_elems = {"gpu5", "cpu", "mem", "net", "proc", "gpu0", "gpu1", "gpu2", "gpu3", "gpu4"}}
        last_press = 0
hauleth commented 2 months ago

Same thing on macOS (Darwin 23.6.0 Darwin Kernel Version 23.6.0).

Backtrace:

  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x000000018437b0cc libsystem_kernel.dylib`__pselect + 8
    frame #1: 0x000000018438f464 libsystem_kernel.dylib`pselect + 112
    frame #2: 0x0000000100096144 btop`Input::poll(unsigned long long) + 212
    frame #3: 0x000000010000ead4 btop`main + 7300
    frame #4: 0x000000018402b154 dyld`start + 2476
  thread #2
    frame #0: 0x0000000184376aa4 libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #3
    frame #0: 0x0000000184376aa4 libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #4
    frame #0: 0x0000000000000000
* thread #5, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100111394 btop`Proc::collect(bool) + 4380
    frame #1: 0x0000000100009c04 btop`Runner::_runner(void*) + 2388
    frame #2: 0x00000001843b5f94 libsystem_pthread.dylib`_pthread_start + 136