aristocratos / btop

A monitor of resources
Apache License 2.0
19.14k stars 602 forks source link

[BUG] Ryzen 7000 Perplexing Error (FreeBSD 13/14 / Windows 11) #629

Open thesunexpress opened 11 months ago

thesunexpress commented 11 months ago

Describe the bug

Running btop++ either on FreeBSD 13 or 14, along with testing on Windows 11, results in immediate crash of btop++ A rather cryptic:

2023/09/22 (17:51:49) | ===> btop++ v.1.2.13
2023/09/22 (17:51:49) | ERROR: Exception in Shared::init() -> key not found

...is returned. Crucially, this occurs exactly the same on both FreeBSD & Windows, which would seem to suggest there is something up with how btop++ detects this hardware.

To Reproduce

Build from git repo source for FreeBSD, run ./btop Installed from available btop4win releases for Windows 11. With or without ~/.config/btop present.

Expected behavior

Sexy top stats, instead, no love.

Screenshots

N/A no useful graphical output

Info (please complete the following information):

Running btop++ with --debug or with gdb offers no additional information, other than the previously mentioned ERROR: Exception in Shared::init() -> key not found (which isn't very helpful at all...)

Running btop++ on any of my other boxes, be that a retiring Intel X299 or Ryzen 5950X build-box, works as expected & have been running for ages now, without issue. Whatever this bug may be, it seems to be a Ryzen 7000 thing. For what it is worth, bpytop runs fine on all boxes, regardless of Ryzen generation.

imwints commented 11 months ago

Since it's CPU specific you might have to step through the source code and tell us where the exception is thrown. The message key not found is thrown by an unordered map somewhere, we just need to figure out which it is and why a key is not found with your CPU

thesunexpress commented 11 months ago

I'm now 100% certain it is a Ryzen 7000 specific thing. Pulled the SSD from the Ryzen 7000 platform, installed it in an Intel platform, booted that SSD & btop++ ran fine without issue. I followed this test by installing the very same SSD in a 5950X machine & btop++ worked perfect there too. Has nobody else run btop++ on Ryzen 7000 yet? I find it hard to believe I'm the first one... I'll start sniffing around in the source code.

thesunexpress commented 11 months ago

gdb is not very helpful when stepping through the code. Guess the crash happens too early?

2023/10/02 (23:10:33) | ===> btop++ v.1.2.13 2023/10/02 (23:10:33) | DEBUG: Starting in DEBUG mode! 2023/10/02 (23:10:33) | INFO: Logger set to DEBUG 2023/10/02 (23:10:33) | DEBUG: Using locale C.UTF-8 2023/10/02 (23:10:33) | INFO: Running on /dev/pts/0 2023/10/02 (23:10:33) | DEBUG: Writing new config file 2023/10/02 (23:10:33) | ERROR: Exception in Shared::init() -> key not found 2023/10/02 (23:10:33) | INFO: Quitting! Runtime: 00:00:00

This is about as much as I can get from it... Really not sure what's going on. The resulting binary, building from source, does work on other platforms -- Intel 9th & 10th Gen, along with Ryzen 3000 & 5000 series. Not sure what the heck is going here.

imwints commented 11 months ago

gdb is not very helpful when stepping through the code. Guess the crash happens too early?

What do you mean? The error can't happen 'too early'

Can you backup your config file and try to launch btop with an empty config directory?

thesunexpress commented 9 months ago

Pulled latest git updates, same issue still. It gets as far as:


(gdb) r
Starting program: /usr/home/<username>/Downloads/btop/build/btop 
ERROR: Exception in Shared::init() -> key not found
[Inferior 1 (process 26037) exited with code 01]
(gdb) 
imwints commented 9 months ago

Can you backup your config file and try to launch btop with an empty config directory?

?

Norman-Normandy commented 9 months ago

Can confirm I get the same problem with FreeBSD 14. This was not an issue with 13.2. Only after the upgrade.

thesunexpress commented 9 months ago

Can you backup your config file and try to launch btop with an empty config directory?

?

With/without config directory present, it crashes all the same. I've tried everything I can think of to step through the code, but am getting nowhere. It even crashes in gdb -- which is a pretty fancy stunt for some otherwise functional code to achieve. bpytop works fine, so it is something unique to btop

It must be something related to new libs/headers from FreeBSD code base, at least that is my hunch. The odd thing being that it only applies to btop & only relatively recent update.

Norman-Normandy commented 9 months ago

I've created a FreeBSD classic/thick jail with a fresh 14.0 Userland. btop works perfectly fine on it. But not the upgraded host system itself. Which tells me it's something with userland. Checking freebsd-version -kru or -ru on jail shows the same versions for all (14.0-RELEASE-p2).

The weird thing is doing the upgrade, on a different machine, in a Hyper-V VM, worked perfectly fine without issues. But not on the host machine on bare metal. Despite going through the same processes and seeing same versions.

For me personally. Out of all the applications, only btop has broken from the upgrade 13.2 -> 14.0.

zenofile commented 8 months ago

On a Ryzen 7000 system, I get a similar but not identical error on master:

ERROR: Exception in runner thread -> Mem:: -> key not found`.

I suppose it's the same underlying problem.

The OS is Fedora Linux 39 though.

make info ``` PLATFORM ?| Linux ARCH ?| x86_64 GPU_SUPPORT :| true CXX ?| g++ (13.2.1) THREADS :| 32 REQFLAGS !| -std=c++20 WARNFLAGS :| -Wall -Wextra -pedantic OPTFLAGS :| -O2 -ftree-vectorize -flto=32 LDCXXFLAGS :| -pthread -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -D_FILE_OFFSET_BITS=64 -fexceptions -fstack-clash-protection -fcf-protection -fstack-pr otector -DGPU_SUPPORT CXXFLAGS +| $(REQFLAGS) $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS) -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wer ror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-s trong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit -frame-pointer -mno-omit-leaf-frame-pointer -g LDFLAGS +| $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS) -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/ lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes ```
imwints commented 8 months ago

@zenofile It would be super helpful if you could get a back trace on the error by stepping through the code with gdb. I don't have access to a 7000 series cpu.

zenofile commented 8 months ago

@zenofile It would be super helpful if you could get a back trace on the error by stepping through the code with gdb. I don't have access to a 7000 series cpu.

gdb.txt

btop immediately exits with the error message after the second exception.

zenofile commented 8 months ago

What I just realized when trying to get more useful information with the debugger:

It seems to only throws on my vertical/portrait mode display when there are noticeable more rows than columns and only if started with terminal already being that size. When running in a smaller terminal and resizing, I cannot reproduce the error.

It crashes as soon as stty size reports ~111~ 109 rows.

108x80 works fine, 109x80 crashes every time.

aristocratos commented 8 months ago

@zenofile Looks like it's crashing beacuse a missing entry in Mem::disk_meters_free. So my rough guess would be that I messed up somewhere in the logic for calculating how many disks will be shown on screen.

So will need to add some more checks for contains in some of the unordered_maps.

@imwints One solution to fix this issue and potential similar issues could be to adapt the current PR with the robin_hood replacement, and instead subclass our own map based on std::unordered_map with class functions inspired from QT's QMap. https://doc.qt.io/qt-6/qmap.html

For example a .value(key, [fallback]) class function that returns the value if the key exists otherwise the fallback. And if the fallback is empty returns a default constructed value.

The same could be done for vector for some extra safety there also.

The code will need to adapted somewhat however to check if the value returned when querying a map is empty in some places to avoid other errors in the draw functions.

zenofile commented 8 months ago

@aristocratos I have about 10 NFSv4 network shares that require Kerberos authentication. When they are authenticated, btop runs fine, otherwise I get a bunch of statvfs errors listed in btop.log. It then exits with the aforementioned key not found exception when there are too many rows.

As they were reported as “Ignored” I didn't think anything of it.

aristocratos commented 8 months ago

@zenofile Thanks for the info. The crash is likely then because the disks that are not responding are removed from the disks list but the draw functions are still expecting them to be there.

The fix should still be the same by just adding a extra check if the disk exists when drawing.

imwints commented 8 months ago

But do you think this is the same that happens to the original issuer? The first error happens either directly in Shared::init() or Mem/Cpu::collect().

This now is about Mem::draw(). Might be the same root cause with the maps.

But I'm curious why this happens only with Ryzen CPU like the author claims

thesunexpress commented 8 months ago

OP here. Things are still weird....

 ~/Downloads/btop>  gmake  

 ██████╗ ████████╗ ██████╗ ██████╗
 ██╔══██╗╚══██╔══╝██╔═══██╗██╔══██╗   ██╗    ██╗
 ██████╔╝   ██║   ██║   ██║██████╔╝ ██████╗██████╗
 ██╔══██╗   ██║   ██║   ██║██╔═══╝  ╚═██╔═╝╚═██╔═╝
 ██████╔╝   ██║   ╚██████╔╝██║        ╚═╝    ╚═╝
 ╚═════╝    ╚═╝    ╚═════╝ ╚═╝      Makefile v1.6
PLATFORM     ?| FreeBSD
ARCH         ?| x86_64
GPU_SUPPORT  :| false
CXX          ?| c++ (16.0.6)
THREADS      :| 32
REQFLAGS     !| -std=c++20
WARNFLAGS    :| -Wall -Wextra -pedantic
OPTFLAGS     :| -O2 -ftree-vectorize -flto=thin
LDCXXFLAGS   :| -pthread -D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -D_FILE_OFFSET_BITS=64 -fexceptions -fstack-clash-protection -fcf-protection -fstack-protector -lm -lkvm -ldevstat -Wl,-rpath=/usr/local/lib/gcc16 -lstdc++
CXXFLAGS     +| $(REQFLAGS) $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS) 
LDFLAGS      +| $(LDCXXFLAGS) $(OPTFLAGS) $(WARNFLAGS) 

Building btop++ (v1.3.0) FreeBSD x86_64
Compiling src/btop_tools.cpp
Compiling src/btop.cpp
Compiling src/freebsd/btop_collect.cpp
Compiling src/btop_input.cpp
Compiling src/btop_theme.cpp
Compiling src/btop_draw.cpp
Compiling src/btop_menu.cpp
Compiling src/btop_shared.cpp
Compiling src/btop_config.cpp
c++: warning: c++: -lm: 'linker' input unused [-Wunused-command-line-argument]warning: 
c++: -lm: 'linker' input unused [-Wunused-command-line-argument]warning: 
-lkvm: 'linker' input unused [-Wunused-command-line-argument]c++
: c++: warning: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]-ldevstat: 'linker' input unused [-Wunused-command-line-argument]

c++: c++warning: : warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]-Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]

c++: c++warning: : -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++warning: : -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]warning: 
-Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
c++c++: : warning: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
-lm: 'linker' input unused [-Wunused-command-line-argument]
c++: c++: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: 
warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++c++: : warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]

c++: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++: c++: warning: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
c++-Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
: warning: -lkvm: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -ldevstat: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Wl,-rpath=/usr/local/lib/gcc16: 'linker' input unused [-Wunused-command-line-argument]
c++: warning: -Z-reserved-lib-stdc++: 'linker' input unused [-Wunused-command-line-argument]
src/btop_tools.cpp:197:30: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
                        std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
                                                  ^
/usr/include/c++/v1/codecvt:187:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8
                           ^
/usr/include/c++/v1/__config:808:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
#    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
                                        ^
/usr/include/c++/v1/__config:781:49: note: expanded from macro '_LIBCPP_DEPRECATED'
#      define _LIBCPP_DEPRECATED __attribute__((deprecated))
                                                ^
src/btop_tools.cpp:197:9: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
                        std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
                             ^
/usr/include/c++/v1/locale:3603:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert
                           ^
/usr/include/c++/v1/__config:808:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
#    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
                                        ^
/usr/include/c++/v1/__config:781:49: note: expanded from macro '_LIBCPP_DEPRECATED'
#      define _LIBCPP_DEPRECATED __attribute__((deprecated))
                                                ^
src/btop_tools.cpp:227:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
                                std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
                                                          ^
/usr/include/c++/v1/codecvt:187:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8
                           ^
/usr/include/c++/v1/__config:808:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
#    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
                                        ^
/usr/include/c++/v1/__config:781:49: note: expanded from macro '_LIBCPP_DEPRECATED'
#      define _LIBCPP_DEPRECATED __attribute__((deprecated))
                                                ^
src/btop_tools.cpp:227:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
                                std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
                                     ^
/usr/include/c++/v1/locale:3603:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert
                           ^
/usr/include/c++/v1/__config:808:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
#    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
                                        ^
/usr/include/c++/v1/__config:781:49: note: expanded from macro '_LIBCPP_DEPRECATED'
#      define _LIBCPP_DEPRECATED __attribute__((deprecated))
                                                ^
10%  -> obj/btop_input.o              (356KiB) (03s)
20%  -> obj/btop_theme.o              (380KiB) (03s)
30%  -> obj/btop_shared.o             (448KiB) (03s)
4 warnings generated.
40%  -> obj/btop_tools.o              (640KiB) (03s)
50%  -> obj/btop_config.o             (672KiB) (03s)
60%  -> obj/btop.o                    (736KiB) (04s)
70%  -> obj/freebsd/btop_collect.o    (736KiB) (04s)
80%  -> obj/btop_menu.o               (896KiB) (04s)
90%  -> obj/btop_draw.o               (1.1MiB) (04s)

Linking and optimizing binary...
100% -> bin/btop                      (1.7MiB) (12s)

Build complete in (16s)
 ~/Downloads/btop> cd bin  
 ~/Downloads/btop/bin>
 ~/Downloads/btop/bin> ./btop  
ERROR: Exception in Shared::init() -> key not found
 ~/Downloads/btop/bin>
aristocratos commented 8 months ago

I'm gonna be implementing some safer containers shortly which should prevent the crash from happening and produce some useful output instead of only key not found.

It might crash at a later point instead however since something obviously is going wrong when initializing.

aristocratos commented 8 months ago

@thesunexpress @zenofile You guys wanna try compiling from the map_safety branch and seeing how far it gets. Should provide some more information on what's failing.

Compile:

git pull
git checkout map_safety
gmake distclean
gmake DEBUG=true

And then run btop --debug and report back how it runs and how the logfile looks.

zenofile commented 8 months ago
===> btop++ v.1.3.0
DEBUG: Starting in DEBUG mode!
INFO: Logger set to DEBUG
DEBUG: Setting LC_ALL=en_US.UTF-8
INFO: Running on /dev/pts/5
INFO: Failed to load libnvidia-ml.so, NVIDIA GPUs will not be detected: libnvidia-ml.so: cannot open shared object file: No such file or directory
WARNING: ROCm SMI: Failed to get maximum GPU temperature, defaulting to 110°C
DEBUG: Shared::init() : Initialized.
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/scratch" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/backup" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/media" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/oci" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/homes" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/sync" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/tmp" with statvfs error code: 13. Ignoring...
WARNING: Failed to get disk/partition stats for mount "/mnt/nfs/storage" with statvfs error code: 13. Ignoring...
ERROR: safeVal() called with invalid key: [/] in file: src/btop_draw.cpp on line: 1334
ERROR: Exception in runner thread -> Mem:: -> unordered_map::at
INFO: Quitting! Runtime: 00:00:02
#0  0x00007ffff7cb51e1 in __cxxabiv1::__cxa_throw (obj=0x7fffb80330c0, tinfo=0x648ca8 <typeinfo for std::out_of_range@GLIBCXX_3.4>, 
    dest=0x7ffff7ccb700 <std::out_of_range::~out_of_range()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffff7ca7738 in std::__throw_out_of_range (__s=0x5b2c51 "unordered_map::at") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:86
#2  0x00000000004df4e4 in std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Draw::Meter>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Draw::Meter> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::at (
    this=0x650940 <Mem::disk_meters_free[abi:cxx11]>, __k="/") at /usr/include/c++/13/bits/hashtable_policy.h:789
#3  0x00000000004da961 in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Draw::Meter, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Draw::Meter> > >::at
    (this=0x650940 <Mem::disk_meters_free[abi:cxx11]>, __k="/") at /usr/include/c++/13/bits/unordered_map.h:1004
#4  0x00000000004bc029 in Mem::draw[abi:cxx11](Mem::mem_info const&, bool, bool) (mem=..., force_redraw=false, data_same=false) at src/btop_draw.cpp:1334
#5  0x0000000000410f0a in Runner::_runner () at src/btop.cpp:605
#6  0x00007ffff7aac897 in start_thread () from /lib64/libc.so.6
#7  0x00007ffff7b336fc in clone3 () from /lib64/libc.so.6

(gdb) sel 4
(gdb) p disk_meters_free
$11 = std::unordered_map with 0 elements
(gdb) p disks.size()
$12 = 10
(gdb) sel 5
(gdb) p mem.disks_order.size()
$13 = 18

If the full backtrace is needed, I can provide it somewhat anonymized.

I don't want to hijack OP's thread at this point if this is unrelated.

thesunexpress commented 8 months ago

Compile:

git pull
git checkout map_safety
gmake distclean
gmake DEBUG=true

And then run btop --debug and report back how it runs and how the logfile looks.

No love from my terminal...

90%  -> obj/btop_draw.o               (8.8MiB) (03s)

Linking and optimizing binary...
100% -> bin/btop                      ( 16MiB) (00s)

Build complete in (04s)
~/Downloads/btop>  cd bin                                                      
~/Downloads/btop/bin>  ./btop --debug                                                   
ERROR: Exception in Shared::init() -> unordered_map::at: key not found
aristocratos commented 8 months ago

@thesunexpress @zenofile Pushed some more commits to the map_safety branch if you wanna try again.

thesunexpress commented 8 months ago

Bingo! bingo

aristocratos commented 8 months ago

@thesunexpress Need the output from your logfile also. But looking at the screenshot it's pretty clear that the issue is in temperature collection for the cpu cores.

zenofile commented 8 months ago

With the new commits it doesn't crash anymore. The network shares show up for a single refresh cycle in the disks output window (as 0/0 Bytes used) and then just disappear at the same time the statvfs error is logged. Nothing new in the btop.log.

thesunexpress commented 8 months ago

@thesunexpress Need the output from your logfile also. But looking at the screenshot it's pretty clear that the issue is in temperature collection for the cpu cores.

Indeed, listing temps for only 2 cpus at the moment. In case it is informative, this happens with/without a .config/btop/btop.conf file present. Ditto, while running with/without --debug as well. Running htop from ports/pkgs lists all CPU temps correctly; so it seems unique to btop.

Here's a quick log, starting with a clean state without a btop.conf file;

cat .config/btop/btop.log

2023/12/25 (14:16:07) | ===> btop++ v.1.3.0
2023/12/25 (14:16:07) | DEBUG: Starting in DEBUG mode!
2023/12/25 (14:16:07) | INFO: Logger set to DEBUG
2023/12/25 (14:16:07) | DEBUG: Using locale C.UTF-8
2023/12/25 (14:16:07) | INFO: Running on /dev/pts/1
2023/12/25 (14:16:07) | DEBUG: Init -> Cpu::collect()
2023/12/25 (14:16:07) | DEBUG: Init -> Cpu::get_cpuName()
2023/12/25 (14:16:07) | DEBUG: Init -> Cpu::get_sensors()
2023/12/25 (14:16:07) | DEBUG: Init -> Cpu::get_core_mapping()
2023/12/25 (14:16:07) | DEBUG: Init -> Mem::collect()
2023/12/25 (14:16:07) | DEBUG: Init -> Mem::get_zpools()
2023/12/25 (14:16:07) | DEBUG: Loading theme file: /usr/local/share/btop/themes/matcha-dark-sea.theme
2023/12/25 (14:16:38) | DEBUG: Writing new config file
2023/12/25 (14:16:38) | INFO: Quitting! Runtime: 00:00:31

EDIT: Never mind that up ^^^ there. The reason the cpu temps don't show up is a drawing issue. Reducing font size "fixes" things. bingo 2

aristocratos commented 8 months ago

PR #696 merged which fixes the crashes. But leaving the issue open since the cause of the issue isn't fixed yet.

thesunexpress commented 8 months ago

PR #696 merged which fixes the crashes. But leaving the issue open since the cause of the issue isn't fixed yet.

This latest merge is working for me. I'll keep an eye on it for a little bit as I noticed occasional crashes before the latest commits, while under heavy load/long running. Whatever the reason for the temps not rendering unless the font is reduced in size, has me a bit funfused. I'll look into firing up the 8-core Ryzen 7000 box & see if it does the same.

zenofile commented 8 months ago

I also had an actual segmentation fault with a8fda16bf6ead94bc5ffafa3e622ee60b1d92d7b on a very lightly loaded Ryzen 7000 Linux system just now. No coredump was captured, so no backtrace but planning on obtaining one next time.

zenofile commented 8 months ago
Thread 3 (Thread 0x7fe9f0ff96c0 (LWP 77873)):
#0  0x00007fea07b2619a in read () from /lib64/libc.so.6
#1  0x00007fea07cd805f in std::__basic_file<char>::xsgetn (this=this@entry=0x7fea07e4fb08 <__gnu_internal::buf_cin+104>, __s=0x56219dde7090 "", __n=__n@entry=8191) at basic_file.cc:340
#2  0x00007fea07d1b308 in std::basic_filebuf<char, std::char_traits<char> >::underflow (this=0x7fea07e4faa0 <__gnu_internal::buf_cin>) at /usr/src/debug/gcc-13.2.1-6.fc39.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/fstream.tcc:354
#3  0x00007fea07d50d36 in std::basic_streambuf<char, std::char_traits<char> >::uflow (this=0x7fea07e4faa0 <__gnu_internal::buf_cin>) at /usr/src/debug/gcc-13.2.1-6.fc39.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/streambuf:710
#4  0x00007fea07d2854e in std::basic_streambuf<char, std::char_traits<char> >::sbumpc (this=<optimized out>) at /usr/src/debug/gcc-13.2.1-6.fc39.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/streambuf:323
#5  std::basic_streambuf<char, std::char_traits<char> >::sbumpc (this=<optimized out>) at /usr/src/debug/gcc-13.2.1-6.fc39.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/streambuf:323
#6  std::basic_istream<char, std::char_traits<char> >::get (this=0x56219d7f2d20 <std::cin>, __c=@0x7fe9f0ff8e6f: 0 '\000') at /usr/src/debug/gcc-13.2.1-6.fc39.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/istream.tcc:306
#7  0x000056219d71baef in Input::InputThr::runImpl (this=0x56219de434f0) at src/btop_input.cpp:112
#8  Input::InputThr::run (that=0x56219de434f0) at src/btop_input.cpp:105
#9  0x00007fea07ce31e3 in std::execute_native_thread_routine (__p=0x56219ddde980) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#10 0x00007fea07aac897 in start_thread () from /lib64/libc.so.6
#11 0x00007fea07b336fc in clone3 () from /lib64/libc.so.6

Thread 2 (Thread 0x7fea07ea3740 (LWP 77853)):
#0  0x00007fea07af71a3 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007fea07b09a37 in nanosleep () from /lib64/libc.so.6
#2  0x000056219d72287a in std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > (__rtime=...) at /usr/include/c++/13/bits/this_thread_sleep.h:80
#3  Tools::sleep_ms (ms=<optimized out>) at src/btop_tools.hpp:308
#4  Input::poll (timeout=1000) at src/btop_input.cpp:161
#5  0x000056219d6af0a4 in main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/13/bits/stl_algobase.h:233

Thread 1 (Thread 0x7fe9f17fa6c0 (LWP 77863)):
#0  0x000056219d719143 in std::_Hashtable<unsigned long, std::pair<unsigned long const, int>, std::allocator<std::pair<unsigned long const, int> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::erase (__k=<optimized out>, this=<optimized out>) at /usr/include/c++/13/bits/hashtable.h:984
#1  std::unordered_map<unsigned long, int, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, int> > >::erase (__x=<optimized out>, this=<optimized out>) at /usr/include/c++/13/bits/unordered_map.h:770
#2  Proc::draw[abi:cxx11](std::vector<Proc::proc_info, std::allocator<Proc::proc_info> > const&, bool, bool) (plist=std::vector of length 688, capacity 688 = {...}, force_redraw=<optimized out>, data_same=false) at src/btop_draw.cpp:1933
#3  0x000056219d6ceab1 in Runner::_runner () at src/btop.cpp:645
#4  0x00007fea07aac897 in start_thread () from /lib64/libc.so.6
#5  0x00007fea07b336fc in clone3 () from /lib64/libc.so.6
thesunexpress commented 8 months ago

I also had an actual segmentation fault with a8fda16 on a very lightly loaded Ryzen 7000 Linux system just now. No coredump was captured, so no backtrace but planning on obtaining one next time.

I've had a crash/dump or two as well, without a coredump captured. Are you running btop in gdb to get the trace? A curious thing: I had btop running during the night & was alive and well. Seems the crashes are a bit random still. My btop.log does not indicate any hints as to what/why it crashes... only a final line INFO: Quitting! Runtime: 07:02:35

imwints commented 8 months ago

@thesunexpress yes, you open the coredump file with gdb and type bt. If you are on a systemd machine and systemd-coredump is running, it will save all generated coredumps for you. You can get into gdb with coredumpctl debug btop

zenofile commented 8 months ago

I've had a crash/dump or two as well, without a coredump captured. Are you running btop in gdb to get the trace?

I was unable to get it to crash when gdb was attached. The trace from above is from an optimized build with extra debug symbols and the core file was saved by the registered handler (kernel.core_pattern = systemd-coredump…) after it got invoked by the kernel, like @imwints said. I believe FreeBSD has a similar kernel knob (core(5))

The actual event also seems to be pretty random and a single btop instance that ran overnight is still running fine.

zenofile commented 8 months ago

I can now reproduce the crash reliably with stress-ng --exec=32 --exec-fork-method=fork --exec-no-pthread --timeout 1s and waiting for half a minute after stress-ng exits. It always crashes in Proc::draw during erase(). I tested this with multiple btop instaces and 200ms refresh time and all instances crash, not just a single one.

I also did test with ankerl::unordered_dense::{map,set} replacing the std::unordered_map and this particular btop ran fine throughout the entire time I tried to come up with a reproducer.

aristocratos commented 8 months ago

@zenofile Can you post the full debug output you get when crashing during stress testing with stress-ng?

zenofile commented 8 months ago

Sure, this is all I got: A backtrace followed by a full backtrace, btop.log, a binary with debug symbols and the core.

gdb.txt btop.log btop_core.zip

Edit: Also a backtrace for a non-optimized debug build gdb_debug_build.txt

aristocratos commented 8 months ago

@zenofile ~This is not compiled from latest git main, there is references to robin_hood.h in there which has been removed.~

Edit: Ignore that, got an old cached gdb.txt for some reason...

aristocratos commented 8 months ago

@zenofile Pushed some changes to improve safety for map erase in Proc::draw. Test if it still crashes.

zenofile commented 8 months ago

@zenofile Pushed some changes to improve safety for map erase in Proc::draw. Test if it still crashes.

That seems to have fixed the issue, at least it does not crash with stress-ng.

thesunexpress commented 8 months ago

@thesunexpress yes, you open the coredump file with gdb and type bt. If you are on a systemd machine and systemd-coredump is running, it will save all generated coredumps for you. You can get into gdb with coredumpctl debug btop

On FreeBSD, systemd is teh supreme evil to some of us ;-) I'll recompile the OS over the weekend with some extra debug features built into the kernel. Keeping an eye on things because I'm still getting the occasional crash for unknown reason(s).

Norman-Normandy commented 8 months ago

Can confirm the new main repo fixed the issue for me by just running git clone and gmake.