Open Arniiiii opened 12 months ago
Are you running btop from the gentoo overlay or from git?
from gentoo https://packages.gentoo.org/packages/sys-process/btop
Are you running btop from the gentoo overlay or from git?
OK, please test the git version since not all clang compatibility patches were back ported for gentoo. I'll try to write a live ebuild for that. But I think this will still happen
ok, I'm testing it: compiled in the way:
MARCH_NATIVE_GCC="-march=skylake -mabm --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=6144" MARCH_NATIVE_CLANG="-march=skylake -mpopcnt -mlzcnt" DEBUG_FLAGS_GCC="-g3 -ggdb" DEBUG_FLAGS_CLANG="-g -glldb -gdwarf-4 -fdebug-default-version=4" COMMON_FLAGS="-O3 -pipe" COMMON_FLAGS_LTO="-Werror=odr -Werror=strict-aliasing" LTO_CLANG="-flto=thin ${COMMON_FLAGS_LTO}" LTO_GCC="-Werror=lto-type-mismatch -flto=4 ${COMMON_FLAGS_LTO}" CFLAGS="${COMMON_FLAGS} ${MARCH_NATIVE_CLANG} ${DEBUG_FLAGS_CLANG} ${LTO_CLANG}" CXXFLAGS="${COMMON_FLAGS} ${MARCH_NATIVE_CLANG} ${DEBUG_FLAGS_CLANG} ${LTO_CLANG} -stdlib=libc++" MAKEOPTS="-j4" CC="clang" CXX="clang++" AR="llvm-ar" NM="llvm-nm" RANLIB="llvm-ranlib" LDFLAGS="${LDFLAGS} -fuse-ld=lld -rtlib=compiler-rt -unwindlib=libunwind -Wl,--as-needed" make VERBOSE=true STATIC=false STRIP=false
make install
if make it easy to read:
CC="clang"
CXX="clang++"
AR="llvm-ar"
NM="llvm-nm"
RANLIB="llvm-ranlib"
MARCH_NATIVE_GCC="-march=skylake -mabm --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=6144"
MARCH_NATIVE_CLANG="-march=skylake -mpopcnt -mlzcnt"
DEBUG_FLAGS_GCC="-g3 -ggdb"
DEBUG_FLAGS_CLANG="-g -glldb -gdwarf-4 -fdebug-default-version=4"
COMMON_FLAGS="-O3 -pipe"
COMMON_FLAGS_LTO="-Werror=odr -Werror=strict-aliasing"
LTO_CLANG="-flto=thin ${COMMON_FLAGS_LTO}"
LTO_GCC="-Werror=lto-type-mismatch -flto=4 ${COMMON_FLAGS_LTO}"
CFLAGS="${COMMON_FLAGS} ${MARCH_NATIVE_CLANG} ${DEBUG_FLAGS_CLANG} ${LTO_CLANG}"
CXXFLAGS="${COMMON_FLAGS} ${MARCH_NATIVE_CLANG} ${DEBUG_FLAGS_CLANG} ${LTO_CLANG} -stdlib=libc++"
LDFLAGS="${LDFLAGS} -fuse-ld=lld -rtlib=compiler-rt -unwindlib=libunwind -Wl,--as-needed"
MAKEOPTS="-j4"
make VERBOSE=true STATIC=false STRIP=false
make install
right now I'm waiting until SIGSEGV appears...
Okay, I have a live ebuild here: https://gitlab.com/nobounce/gentoo-overlay/-/raw/main/sys-process/btop/btop-9999.ebuild so the files are still managed by portage
Okay, I have a live ebuild here: https://gitlab.com/nobounce/gentoo-overlay/-/raw/main/sys-process/btop/btop-9999.ebuild so the files are still managed by portage
so, what run to test? your ebuild or mine manually compiled & installed stuff ?
running manually compiled stuff
As long as it's the git version it's fine
#0 0x00007ffff7f3d92b in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow() ()
from /usr/lib64/libc++.so.1
#1 0x00007ffff7f4018f in std::__1::basic_istream<char, std::__1::char_traits<char> >::ignore(long, int)
() from /usr/lib64/libc++.so.1
#2 0x00005555556d98dc in Proc::collect (no_update=false) at src/linux/btop_collect.cpp:1868
#3 0x00005555555908c1 in Runner::_runner () at src/btop.cpp:570
#4 0x00007ffff7d0602c in start_thread (arg=<optimized out>) at pthread_create.c:444
#5 0x00007ffff7d8590c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I'll try your ebuild.
Ok this took some time. Did you experience this with -O2
as well? You said that you're going to try it with no optimizations at all. This would be interesting to know
Ok this took some time. Did you experience this with
-O2
as well? You said that you're going to try it with no optimizations at all. This would be interesting to know
haven't checked yet. wait plz
@Gerodote
Hi, would you mind trying to build with as similar as possible flags with gcc
and also try one of the musl
binaries from https://github.com/aristocratos/btop/releases/tag/v1.2.13.
If both of those runs fine if would be pretty clear indicator that it's clang related.
@Gerodote Hi, would you mind trying to build with as similar as possible flags with
gcc
and also try one of themusl
binaries from https://github.com/aristocratos/btop/releases/tag/v1.2.13.If both of those runs fine if would be pretty clear indicator that it's clang related.
I think I can try GCC with classic LTO ( not LTO-thin ). I guess, in my case, because I have glibc setup, I cannot try musl binary.
The musl binary is static, it works on all 2.6+ kernels.
The musl binary is static, it works on all 2.6+ kernels.
ok, after testing nobounce's ebuild and after nobounce's ebuild with GCC .
Okay, I can reproduces this when hitting the system with some load
nobounce's ebuild + clang no lto, no debug symbols, no native flags, maybe O2 :
#0 0x00007ffff7f3d92b in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow() ()
from /usr/lib64/libc++.so.1
#1 0x00005555555cbec6 in ?? ()
#2 0x00005555556ef335 in ?? ()
#3 0x000055555558fc5b in ?? ()
#4 0x00007ffff7d0602c in start_thread (arg=<optimized out>) at pthread_create.c:444
#5 0x00007ffff7d8590c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
the reason there's no debug symbols at btop is because idk how to properly compile using your ebuild. I used manually ebuild , so it's used just clang profile ( I guess O2 , libc++, no lto, no debug symbols, no src installed, no native flags...)
@nobounce Can you give an instruction to add your repo please?
I tried this:
eselect repository add nobounce git https://gitlab.com/nobounce/gentoo-overlay/
emaint -r nobounce sync
but then emerge
seems not to see your repo.
I compiled manually using
cd /var/db/repos/nobounce/sys-process/btop
GENTOO_MIRRORS="" ebuild ./btop-9999.ebuild manifest clean unpack
ebuild btop-9999.ebuild clean install merge
also this doesn't work:
root@livecd /v/d/r/n/s/btop (main)# eselect repository enable nobounce
error: nobounce: repository already enabled, in /var/db/repos/nobounce
how it looks like when trying emerge btop using your ebuild:
root@livecd ~# emerge -av =sys-process/btop-9999::nobounce --autounmask
!!! Section 'nobounce' in repos.conf has name different from repository name 'local' set inside repository
* IMPORTANT: 1 news items need reading for repository 'librewolf'.
* IMPORTANT: 17 news items need reading for repository 'gentoo'.
* Use eselect news read to view new items.
These are the packages that would be merged, in order:
Calculating dependencies... done!
Dependency resolution took 1.13 s.
emerge: there are no ebuilds to satisfy "=sys-process/btop-9999::nobounce".
UPDATE: solved by readding your repo under name local
.
IDK how, I got even another SIGSEGV with clang lto and nobounce's ebuild:
#0 0x00007ffff7f3b92b in ?? ()
#1 0x00007ffff75fe710 in ?? ()
#2 0x00005555555cbec6 in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::sbumpc[abi:v160006]() (this=0x7ffff003b5a0) at /usr/include/c++/v1/streambuf:191
#3 std::__1::getline[abi:v160006]<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_istream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, char) (__is=..., __str=..., __dlm=10 '\n')
at /usr/include/c++/v1/istream:1517
#4 0x000000000000000a in ?? ()
#5 0x00007ffff0058208 in ?? ()
#6 0x00000000000044c0 in ?? ()
#7 0x00007ffff0058120 in ?? ()
#8 0x00005555556ef335 in std::__1::getline[abi:v160006]<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_istream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) (__is=..., __str=...)
at /usr/include/c++/v1/istream:1559
#9 Proc::collect (no_update=false) at src/linux/btop_collect.cpp:1803
#10 0x0000000000000000 in ?? ()
I'll try get more packages with debug symbols.
#0 0x00007ffff7f3b96b in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow (
this=0x7ffff75fe720) at include/c++/v1/streambuf:447
#1 0x00007ffff7f3e1cf in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::sbumpc (
this=0x7ffff002e110) at include/c++/v1/streambuf:191
#2 std::__1::basic_istream<char, std::__1::char_traits<char> >::ignore (this=0x7ffff75fe710,
__n=9223372036854775807, __dlm=32) at include/c++/v1/istream:962
#3 0x00005555556ef5ec in Proc::collect (no_update=false) at src/linux/btop_collect.cpp:1868
#4 0x000055555558fc5b in Runner::_runner () at src/btop.cpp:570
#5 0x00007ffff7d0402c in start_thread (arg=<optimized out>) at pthread_create.c:444
#6 0x00007ffff7d8390c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
gonna try compile with stdlib=libstdc++ ( aka compile using gcc's c++ lib )
It seems to happen if -stdlib=libc++ .
OK, please test the git version since not all clang compatibility patches were back ported for gentoo. I'll try to write a live ebuild for that. But I think this will still happen
All of the important fixes were backported. The only fixes from your PR that were left out were the ones for the std::move warnings.
@nobounce Did you find a consistent setup for this segfault? Currently trying to debug this.
Nope, it happened 2/4 times when running under high system load with Clang, O3, LTO and libc++.
The variable responsible for that is pread
in the Proc::collect()
function of src/linux/btop_collect.cpp
Don't know if it could be related to https://github.com/aristocratos/btop/blob/2c3ac4855de49563edd4ef199b0be74babc7ce32/src/btop_tools.hpp#L152
Since there's a lot of calls to ignore
with SSmax
as limit and can see reference to ::ignore
in at least one of the backtraces above.
Could try to replace all instances of SSmax
with std::numeric_limits<std::streamsize>::max()
directly, and see if that makes any difference.
@aristocratos
max()
doesn't seem to be the issue.
#0 0x00007ffff7f3b96b in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow (
this=0x7ffff75fe720) at include/c++/v1/streambuf:447
#1 0x00007ffff7f3e1cf in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::sbumpc (
this=0x7ffff002e110) at include/c++/v1/streambuf:191
#2 std::__1::basic_istream<char, std::__1::char_traits<char> >::ignore (this=0x7ffff75fe710,
__n=9223372036854775807, __dlm=32) at include/c++/v1/istream:962
The call stack is
From cppreference of sbumpc:
Reads one character and advances the input sequence by one character. 1) If the input sequence read position is not available, returns uflow(). Otherwise returns Traits::to_int_type(*gptr()). 2) Same as (1), but discards the result.
Here is the implementation in LLVM: https://github.com/llvm/llvm-project/blob/4b0df112daa1d967abeca1473c5e81e21d9af36a/libcxx/include/istream#L972
Alright, I found a setup to consistently reproduce:
stress-ng --fork 100 & ( sleep 0.07 && killall stress-ng )
Seems to be caused by processes very quickly opening and being terminated. UBSan also triggers on the segfault, here's the output if it's of use:
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==5966==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7efe0fc9974b bp 0x7efe100750f0 sp 0x7efe10075070 T5967)
==5966==The signal is caused by a READ memory access.
==5966==Hint: address points to the zero page.
#0 0x7efe0fc9974b in std::__1::basic_streambuf<char, std::__1::char_traits<char>>::uflow() (/lib/libc++.so.1+0x9c74b)
#1 0x55793c8a46c7 in __sanitizer::ReportDeadlySignal(__sanitizer::SignalContext const&, unsigned int, void (*)(__sanitizer::SignalContext const&, void const*, __sanitizer::BufferedStackTrace*), void const*) (/usr/bin/btop+0x5c06c7)
#2 0x55793c8a49e8 in __sanitizer::HandleDeadlySignal(void*, void*, unsigned int, void (*)(__sanitizer::SignalContext const&, void const*, __sanitizer::BufferedStackTrace*), void const*) (/usr/bin/btop+0x5c09e8)
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV (/lib/libc++.so.1+0x9c74b) in std::__1::basic_streambuf<char, std::__1::char_traits<char>>::uflow()
==5966==ABORTING
@vimproved Huh.
Could try to add add a if (pread.good())
check before line:
https://github.com/aristocratos/btop/blob/2c3ac4855de49563edd4ef199b0be74babc7ce32/src/linux/btop_collect.cpp#L1829
and
https://github.com/aristocratos/btop/blob/2c3ac4855de49563edd4ef199b0be74babc7ce32/src/linux/btop_collect.cpp#L1931
and see if that solves anything.
@vimproved Huh.
Could try to add add a
if (pread.good())
check before line:and
and see if that solves anything.
Unfortunately nope.
This is really weird, are you supposed to manually check if the file still exists before every operation in the ifstream implementation of libc++?
Maybe a bug when reading from virtual files (/proc), since this should throw an exception or set a failbit right?
At least that is what is happening in libstdc++ ifstream.
@vimproved
Do you wanna try adding if (access(d.path().c_str(), R_OK) != 0) continue;
or if (not std::filesystem::is_directory(d.path())) continue;
before every pread.open
in btop_collect.cpp
between lines 1772 and 1949.
If that doesn't work, adding it before every read or ignore operation in that same section.
@vimproved Do you wanna try adding
if (access(d.path().c_str(), R_OK) != 0) continue;
orif (not std::filesystem::is_directory(d.path())) continue;
before everypread.open
inbtop_collect.cpp
between lines 1772 and 1949.If that doesn't work, adding it before every read or ignore operation in that same section.
Putting that before all the ignore calls works, but instead of segfaulting it just freezes for a couple seconds.
@vimproved Do you wanna try adding
if (access(d.path().c_str(), R_OK) != 0) continue;
orif (not std::filesystem::is_directory(d.path())) continue;
before everypread.open
inbtop_collect.cpp
between lines 1772 and 1949. If that doesn't work, adding it before every read or ignore operation in that same section.Putting that before all the ignore calls works, but instead of segfaulting it just freezes for a couple seconds.
Nevermind, I spoke prematurely.
The exact same error also?
Well, if the file disappears in the middle of a ignore or read operation and there isn't even an exception thrown, I don't know if there is much more we can do.
Anybody know any LLVM people to tag that might have any insight?
template<typename _CharT, typename _Traits>
basic_istream<_CharT, _Traits>&
basic_istream<_CharT, _Traits>::
ignore(streamsize __n, int_type __delim)
{
_M_gcount = 0;
sentry __cerb(*this, true);
if (__cerb && __n > 0)
{
ios_base::iostate __err = ios_base::goodbit;
__try
{
const int_type __eof = traits_type::eof();
__streambuf_type* __sb = this->rdbuf();
int_type __c = __sb->sgetc();
======================== HERE =====================
// See comment above.
bool __large_ignore = false;
while (true)
{
while (_M_gcount < __n
&& !traits_type::eq_int_type(__c, __eof)
&& !traits_type::eq_int_type(__c, __delim))
{
++_M_gcount;
__c = __sb->snextc();
}
if (__n == __gnu_cxx::__numeric_traits<streamsize>::__max
&& !traits_type::eq_int_type(__c, __eof)
&& !traits_type::eq_int_type(__c, __delim))
{
_M_gcount =
__gnu_cxx::__numeric_traits<streamsize>::__min;
__large_ignore = true;
}
else
break;
}
if (__n == __gnu_cxx::__numeric_traits<streamsize>::__max)
{
if (__large_ignore)
_M_gcount = __gnu_cxx::__numeric_traits<streamsize>::__max;
if (traits_type::eq_int_type(__c, __eof))
__err |= ios_base::eofbit;
else
{
if (_M_gcount != __n)
++_M_gcount;
__sb->sbumpc();
}
}
else if (_M_gcount < __n) // implies __c == __delim or EOF
{
if (traits_type::eq_int_type(__c, __eof))
__err |= ios_base::eofbit;
else
{
++_M_gcount;
__sb->sbumpc();
}
}
}
__catch(__cxxabiv1::__forced_unwind&)
{
this->_M_setstate(ios_base::badbit);
__throw_exception_again;
}
__catch(...)
{ this->_M_setstate(ios_base::badbit); }
if (__err)
this->setstate(__err);
}
return *this;
}
This is the comment from the other overload of ::ignore()
:
// N.B. On LFS-enabled platforms streamsize is still 32 bits // wide: if we want to implement the standard mandated behavior // for n == max() (see 27.6.1.3/24) we are at risk of signed // integer overflow: thus these contortions. Also note that, // by definition, when more than 2G chars are actually ignored, // _M_gcount (the return value of gcount, that is) cannot be // really correct, being unavoidably too small.
On the other hand libc++ omits this check, this is probably the issue
Passing std::numeric_limits<std::streamsize>::max() - 1
to all ignore functions "fixes" it partially.
But this also happens in std::getline
a few lines further down. I tried to replace it with operator>>
but this doesn't help either.
Also the opt-level or LTO don't matter, this is purely a libc++ issue.
Any updates here? Getting a (similar?) backtrace
#1 0x0000555555690c04 in std::__1::basic_streambuf<char, std::__1::char_traits<char> >::sbumpc[abi:ne180100]() (this=0x7ffff7a87a38) at /usr/include/c++/v1/streambuf:185
#2 0x000055555568a423 in std::__1::getline[abi:ne180100]<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_istream<char, std::__1::char_traits<char> >&, st
d::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, char) (__is=..., __str=..., __dlm=10 '\n') at /usr/include/c++/v1/istream:1260
#3 0x0000555555745e42 in std::__1::getline[abi:ne180100]<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_istream<char, std::__1::char_traits<char> >&, st
d::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) (__is=..., __str=...) at /usr/include/c++/v1/istream:1294
#4 0x000055555576c750 in Proc::collect (no_update=false) at /var/tmp/portage/sys-process/btop-1.3.2/work/btop-1.3.2/src/linux/btop_collect.cpp:2627
#5 0x000055555560a289 in Runner::_runner () at /var/tmp/portage/sys-process/btop-1.3.2/work/btop-1.3.2/src/btop.cpp:692
#6 0x00007ffff7feaf1b in ?? () from /lib/ld-musl-x86_64.so.1
#7 0x0000000000000000 in ?? ()```
Describe the bug
SIGSEGV boundary error .
happens at machine with glibc Compiled with clang , LTO-thin , O3, native flags, debug symbols. Used libc++ , compiler-rt . e.g. gentoo clang-profile.
To Reproduce
run btop for several hours.
everything fine, until a SIGSEGV happens.
Expected behavior
not sigsegv
Screenshots
here should be photo of btop with braile style under tmux with text somewhere SIGSEGV ...
Info (please complete the following information):
btop -v
: btop version: 1.2.13Additional context
contents of
~/.config/btop/btop.log
: nothing related.(try running btop with
--debug
flag if btop.log is empty)GDB Backtrace
O3 + LTO-thin + debug symbols got me here: gdb btop ... wait ... bt :
I'll try glibc compile with debug symbols, wait a day-two for me to get the error again. Then I'll try btop under O0 , LTO-thin and debug symbols.