Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

--recursive has unexpected interaction with piped input #404

Closed AndydeCleyre closed 2 months ago

AndydeCleyre commented 2 months ago

Hello again!

I don't know how intentional or (un)surprising this behavior is for others, but I found it surprising. I've been using an alias for ug which provides --recursive, revealing the following behavior (no aliases used below). I mention that because it wouldn't make sense to add --recursive when searching piped output, but I thought it would be "safe" to have always-on.

ugrep 6.0.0

Setup:

$ mkdir tmp
$ cd tmp
$ echo matchcontent >f1
$ echo othercontent >matchname

Test:

$ ls | ug match
     2: matchname
$ ls | ug --recursive match
(standard input)
     2: matchname

f1
     1: matchcontent

I expected those commands to have identical results to the first results, not searching any file content at all.

That said, I'm realizing that was a bad assumption, given GNU grep's behavior -- I am mostly using ugrep as a ripgrep replacement, and this behavior differs across GNU grep, busybox grep, and ripgrep:

$ ls | grep -r match
f1:matchcontent
$ ls | busybox grep -r match
matchname
$ ls | rg match
matchname

I wanted to check what your thoughts are on this, and if you can recommend a configuration (alias or ugrep config, alias preferred) that more closely matches the ripgrep and busybox grep behavior.

Thanks for any insight!


Actually, it's unclear to me if the content match is because of the filename being present in the searched output, or because it's in the current folder. So one last test:

$ mkdir deeper
$ cd deeper
$ ls .. | grep -r match  # return code: 1
$ ls .. | ug --recursive match
(standard input)
     3: matchname
$ ls .. | grep match
matchname
genivia-inc commented 2 months ago

It's doing what it was designed to do, to search recursively in all inputs with option -r (--recursive), including standard input because of the pipe.

GNU grep and BSD grep do not search standard input at all when option -r is specified, even when piped. It searches the working directory as option -r is intended to search recursively.

So there is no standard approach in this case. If you don't want to search recursively, then don't use option -r.

As a side note, you don't need to specify option -r when you omit search targets because the default is to recursively search the working directory unless you pipe data to ugrep.

EDIT 2: for good measure I tried your example with GNU and BSD grep:

$ ls | grep -r match
f1:matchcontent

so it searches recursively as intended, but ignores the standard input (edited, because I didn't look more closely what you had written.)

AndydeCleyre commented 2 months ago

Ok, thanks! So an alias with --recursive isn't safe as a general usage invocation. If this is all as intended, I'm happy for this question to be closed.


If anyone else arrives here with my same confusion, to get ugrep to behave more like busybox grep and ripgrep in these cases, I'm going to replace this in my .zshrc:

alias ug="=ug --recursive --smart-case --glob-ignore-case --hidden --ignore-binary"

with:

ug () {
  emulate -L zsh

  local args=(--smart-case --glob-ignore-case --hidden --ignore-binary)
  if [[ -t 0 ]] {
    args+=(--recursive)
  } else {
    args+=(--no-line-number)
  }

  =ug $args $@
}
genivia-inc commented 2 months ago

Well, if there are like-minded folks chiming in to change this to what you want then we might change it. On the other hand, there may be others who want to keep it this way or at least replicate GNU grep's behavior. I just wanted to make sure ugrep doesn't ignore standard input even when --recursive is used, when GNU grep ignores it. Ignoring something to search seems worse to me than searching more.

After all, what you don't see in the search results does not raise suspicion when you expect all inputs to be searched. Suspicious are only raised when you see more results than expected, so actions can be taken to correct that.

That's also why searching binary files is the default, because some people want it and expect it to comply with GNU grep. To turn that off, add --ignore-binary to a .ugrep config file. Then you'll remember why it is ignored by default.

UX design is hard and won't make everyone happy.

AndydeCleyre commented 2 months ago

Thanks for all the explanation and help, again!

That's also why searching binary files is the default . . .

Ha, thanks for pointing this out, going to update my function again.