ggreer / the_silver_searcher

A code-searching tool similar to ack, but faster.
http://geoff.greer.fm/ag/
Apache License 2.0
26.08k stars 1.42k forks source link

Issues when searching with '-z' #1280

Open wavexx opened 5 years ago

wavexx commented 5 years ago

I really have some troubles when trying to use ag with '-z' to search inside compressed files. Simple gzip compressed files. I have run into all reported bugs here: segfaults (#1157), random messages (#1243) and inconsistent results due to early exists or plainly wrong hits.

I'm using ag 2.1.0 from Debian unstable.

One example: I'm on /usr/share/doc/ trying to find some documentation and searching through gzipped files with ag -z. I simply cannot use ag: it just stops with "truncated file: Success" somewhere. Generally after printing garbage results.

I've found one example:

$ file changelog.Debian.gz
changelog.Debian.gz: gzip compressed data, max compression, from Unix, original size 16367
$ ag -z test changelog.Debian.gz
168:  * Use a correct way to test thing in shell (Closes: #443715)

197:n
ag: truncated file: Not a directory

wat?? Here's what grep says:

$ zgrep -n test changelog.Debian.gz
168:  * Use a correct way to test thing in shell (Closes: #443715)
196:  * Add a note on what unison-latest-stable is supposed to do in
259:    binary to unison-latest-stable and use this binary in the alternative
343:     - it'd really annoy me if the version in woody wasn't the latest

so, huh, the 197 wasn't "too far" from a real hit..

Sadly, this makes searching through directories with -z simply impossible: ag will stop for random reasons anywhere in the tree. I can provide more details and test samples if needed (almost any gzipped file will do).

$ ag --version
ag version 2.1.0

Features:
  +jit +lzma +zlib
wavexx commented 5 years ago

I'm providing the sample file, as it's just a 5.5k changelog without any special content.

changelog.Debian.gz

art-licis commented 5 years ago

I faced the same issue (on Linux). Even on a very simple and small gzipped log file search results are way off after the first or two hits. No issues using on MacOS.

This is not the issue for the older versions, 2.0.0 works just fine.

I've analyzed the diff between 2.0.0 and 2.2.0, and I expect that it was this fix to break search in archives on some platforms:

https://github.com/ggreer/the_silver_searcher/pull/1106/commits/2ec37825e589591e01d17a01673b4d2ac2da980c

This change introduced zip file streaming via fopencookie (and MacOS doesn't support it, therefore falls back to original approach).

b0o commented 4 years ago

I'm having the same issue on Linux, trying to search within man pages:

mag () { # man + ag
  ag -z "$@" $(sed -e 's/:/ /g' <<<"$MANPATH")
}

ag fails with something along the lines of:

ag: truncated file: Success

or

ERR: Found mem/data error while decompressing zlib stream: data error

or sometimes just randomly exits in the middle of execution.

Here's a -D debug log for the command ag -Dz getopts /usr/local/man /usr/local/share/man /usr/share/man /usr/man: ag-debug.log

$ ag --version
ag version 2.2.0

Features:
  +jit +lzma +zli
wangqiaoqian commented 4 years ago

I come accross the same problem, downgrade 2.0.0, and it is working!!!

mwagnell commented 4 years ago

I've had many of these problems too. Based on the comments above, I changed src/config.h to:

/ Define to 1 if you have the `fopencookie' function. /

define HAVE_FOPENCOOKIE 0

And things seem much better. The code that uses fopencookie is not stable.

art-licis commented 4 years ago

Switching off fopencookie makes it work (also built like that in the past); however, it will obviously fallback to default strategy which may cause OOM for larger files.

cemeyer commented 4 years ago

Inexplicably, it seemed to work pretty well when I wrote it three years ago. I don't have too much context anymore, and I don't know what has changed in ag since that time. If ag ends up using rewind on the compressed stream, that's broken in the following way:

--- a/src/zfile.c
+++ b/src/zfile.c
@@ -357,6 +357,7 @@ zfile_seek(void *cookie_, off64_t *offset_, int whence) {
         cookie->decode_offset = 0;
         cookie->logic_offset = 0;
         zfile_cookie_cleanup(cookie);
+        rewind(cookie->in);
         zfile_cookie_init(cookie);
     } else if ((uint64_t)new_offset > cookie->logic_offset) {
         /* Emulate forward seek by skipping ... */

But I don't know why it would rewind while streaming.

Some of the issues are related to unsupported compression formats: zip, at least. There is no good fallback path there and as a result ag segfaults. That doesn't explain the corrupt gzip search result, though.