Closed marcin-github closed 3 days ago
Will take a look. It looks this an IO problem with buffer copy, which is unexpected. I've never seen anything like this on x64. I assume this is on a Linux box?
It would also be very useful to know what kind of file is being read by the indexer (it may or may not be a compressed file.) Is there a way for you to isolate this to a small set of files or to a single file that shows this issue? So we know what input file (or type of compressed file) causes this?
Yes, this is linux x64, ubuntu 22. It looks this is independent of given file. I run indexer inside dir which contains ~100000 dirs, each dir contains from one to ~50 jpeg files. I run indexer under strace and last accessed file is changind "randomly".
I checked ugrep-indexer compiled with clang address sanitizer, but could not find any problems when running it on large directories with mixed archives, compressed files, binaries etc.
The following command builds ugrep and ugrep-indexer with the address sanitizer when clang is the compiler:
./build.sh CXXFLAGS='-fsanitize=address -O1 -g'
Note: when building it this way for ugrep < v7, the ugrep test at the end reports a problem. But that problem is benign, because I'm using a memcmp
(via std::string::compare
) on a table where I don't care if the table strings compared are shorter (the overflow is in the table). But it's ugly to have this warning report, so I will correct this in the upcoming v7 release.
Then run ugrep-indexer again to find memory bugs.
Note that option -z
uses compression libraries. If one of the compression libraries has a bug, then this would explain the crash.
Afterwards, don't forget to rebuild the tools without sanitizer (just ./build.sh
), otherwise the tools will run slow!
Any chance to try out my suggestion to help locate the problem? I am unable to replicate this problem, which might be specific to some of the compressed files you have. It might be a single (malformed) compressed file that triggers this.
Closing this until we receive more details.
Hi,
i'll try to help in debuging. I can't build with -fsanitize=address
because configure don't play well with it. I get in config.log:
g++: error: unrecognized command-line option '-qversion'; did you mean '--version'?
g++: fatal error: no input files
compilation terminated.
configure:3997: $? = 1
configure:4017: checking whether the C++ compiler works
configure:4039: g++ -fsanitize=address -O1 -g conftest.cpp >&5
configure:4043: $? = 0
configure:4093: result: yes
configure:4096: checking for C++ compiler default output file name
configure:4098: result: a.out
configure:4104: checking for suffix of executables
configure:4111: g++ -o conftest -fsanitize=address -O1 -g conftest.cpp >&5
configure:4115: $? = 0
configure:4138: result:
configure:4160: checking whether we are cross compiling
configure:4168: g++ -o conftest -fsanitize=address -O1 -g conftest.cpp >&5
configure:4172: $? = 0
configure:4179: ./conftest
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
[floood of above messages]
when config.log grew to 2GB I cancel it.
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
Build with clang:
$ ./build.sh CXX='clang++ -std=gnu++11' CC=clang CXXFLAGS='-fsanitize=address -O1 -g'
Tests failed:
make test
*** SINGLE-THREADED TESTS ***
ugrep 7.0.3 x86_64-pc-linux-gnu +avx2; -P:pcre2jit; -z:zlib,bzip2,lzma,lz4,zstd,brotli,7z,tar/pax/cpio/zip
Have libpcre2? yes (recommended)
Have libz? yes (recommended)
Have libbz2? yes (recommended)
Have liblzma? yes (recommended)
./verify.sh: linia 77: 1135710 Naruszenie ochrony pamięci $UG -Fq 'HAVE_LIBLZ4 1' "$CONFIGH"
Have liblz4? no (optional)
Have libzstd? yes (optional)
Have libbrotli? yes (optional)
Have 7zip? yes (optional)
.--- - 2024-11-14 20:15:27.031711215 +0100
+++ out/dir.out 2024-04-30 10:19:49.148439432 +0200
@@ -0,0 +1,3 @@
+dir1/Hello.bat
+dir1/Hello.sh
+dir1/makefile
Error: ugrep --sort -rl Hello dir1 failed
make: *** [Makefile:1106: test] Błąd 1
Testing failed, please open an issue at:
https://github.com/Genivia/ugrep/issues
I couldn't get coredump with indexer compiled with those options. I only get signal: ugrep-indexer[1148411] overflowed sigaltstack
in dmesg . So i run int under gdb:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> index accuracy: 9 (3%~10% noise)
> decompress: yes (zmax=1)
> ignore binary: yes
> ignore files: no
> index hidden: no
[New Thread 0x7ffff5000640 (LWP 1151616)]
I 0 0% /home/logi/avki/xx/yyy.jpg
Thread 2 "ugrep-indexer" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff5000640 (LWP 1151616)]
__GI___libc_write (nbytes=57613, buf=0x631000014898, fd=6) at ../sysdeps/unix/sysv/linux/write.c:26
26 ../sysdeps/unix/sysv/linux/write.c: Nie ma takiego pliku ani katalogu.
(gdb) thread apply all bt
Thread 2 (Thread 0x7ffff5000640 (LWP 1151616) "ugrep-indexer"):
#0 __GI___libc_write (nbytes=57613, buf=0x631000014898, fd=6) at ../sysdeps/unix/sysv/linux/write.c:26
#1 __GI___libc_write (fd=6, buf=0x631000014898, nbytes=57613) at ../sysdeps/unix/sysv/linux/write.c:24
#2 0x00005555555959c0 in write ()
#3 0x00005555556528ff in Zthread::decompress (this=0x7fffffffde50) at ./zthread.hpp:485
#4 0x0000555555661959 in std::__invoke_impl<void, void (Zthread::*)(), Zthread*> (__f=<optimized out>, __t=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74
#5 std::__invoke<void (Zthread::*)(), Zthread*> (__fn=<optimized out>, __args=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#6 std::thread::_Invoker<std::tuple<void (Zthread::*)(), Zthread*> >::_M_invoke<0ul, 1ul> (this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:259
#7 std::thread::_Invoker<std::tuple<void (Zthread::*)(), Zthread*> >::operator() (this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:266
#8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (Zthread::*)(), Zthread*> > >::_M_run (this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211
#9 0x00007ffff7adc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007ffff7726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 1 (Thread 0x7ffff7c47100 (LWP 1151613) "ugrep-indexer"):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fffffffdf30) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7fffffffdf30) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fffffffdf30, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7693a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fffffffde80, cond=0x7fffffffdf08) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x7fffffffdf08, mutex=0x7fffffffde80) at ./nptl/pthread_cond_wait.c:627
#5 0x000055555564d775 in Zthread::open_next (this=0x7fffffffde50, pathname=<optimized out>) at ./zthread.hpp:237
#6 0x000055555564ab70 in Stream::read_next_file (this=<optimized out>, pathname=<optimized out>, archive=<optimized out>) at ugrep-indexer.cpp:499
#7 0x000055555563f623 in index (stream=..., pathname=0x189 <error: Cannot access memory at address 0x189>, pathname@entry=0x1 <error: Cannot access memory at address 0x1>, hashes=0x7ffffffddb70 "", hashes@entry=0x7ffffffddb58 "", hashes_size=@0x7fffffffdfd0: 0, noise=@0x7fffffffdff0: 0, compressed=@0x7fffffffe020: false, archive=@0x7fffffffe010: false, binary=@0x7fffffffe000: false, size=@0x7fffffffe030: 0) at ugrep-indexer.cpp:827
#8 0x0000555555646da3 in indexer (pathname=<optimized out>) at ugrep-indexer.cpp:1700
#9 0x000055555564a8e9 in main (argc=6, argv=0x7fffffffe1f8) at ugrep-indexer.cpp:2065
(gdb)
This is very likely not the problem, because broken pipes are caught and ignored to continue processing. For example when a file is binary then we don't want to read the whole thing, so the receiving pipe's end will be closed after detecting it as a binary, which triggers a broken pipe on the sending side. GDB doesn't block this, so you're getting sigpipe in GDB and won't be able to make progress.
ugrep-indexer -9 -z -v (without -I) run without issue but with
-I
i get segfault, below is backtrace:ugrep-indexer is at HEAD.