BurntSushi / ripgrep

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
The Unlicense
47.9k stars 1.97k forks source link

Switch to avoid crossing filesystem boundaries #321

Closed wavexx closed 6 years ago

wavexx commented 7 years ago

It's often useful to search only files that reside on the same filesystem as the search paths, same as find -xdev or ag --one-device. I find ag's "one-device" switch to be misleading, as it's more correctly described by find as "do not cross filesystem boundaries", since the restriction applies to each search path in turn (which can span more than one device).

BurntSushi commented 7 years ago

This needs to be implemented in walkdir (see: https://github.com/BurntSushi/walkdir/issues/8) and ignore's parallel iterator, and then exposed up through ripgrep as a command line flag.

I agree with the naming choice. Using find's flag name seems fine. It looks like GNU grep doesn't have a similar flag?

wavexx commented 7 years ago

On Wed, Jan 18 2017, Andrew Gallant wrote:

I agree with the naming choice. Using find's flag name seems fine. It looks like GNU grep doesn't have a similar flag?

Indeed it doesn't, which is a shame.

If you're working with FUSE and/or remote filesystems, -xdev is very useful (either to avoid crossing, and/or to escape the local fs).

BurntSushi commented 6 years ago

I think I'm liking the name --mount for this. It's supported by find, is relatively short, and feels a bit more generic than --xdev.

Does anyone know the provenance of the name --xdev? I guess "dev" means "device" (which seems consistent with the dev attribute in the stat structure) and I guess x means "cross" or "don't cross" in this case.

ssokolow commented 6 years ago

Does anyone know the provenance of the name --xdev? I guess "dev" means "device" (which seems consistent with the dev attribute in the stat structure) and I guess x means "cross" or "don't cross" in this case.

I suspect you're correct there, given that tools like ncdu use the short option -x and describe it as follows:

-x Do not cross filesystem boundaries, i.e. only count files and directories on the same filesystem as the directory being scanned.

Also, I'm not so sure about --mount. It doesn't feel at all intuitive to me, which is probably why the manpage for GNU find lists -mount it as being a compatibility alias for -xdev rather than the other way around.

okdana commented 6 years ago

Does anyone know the provenance of the name --xdev?

In 2.11BSD find uses a global variable called Xdev that is commented as true if SHOULD cross devices; the -xdev 'primary' flips it off. Probably the latter is named after the former? Seems like -noxdev or something would have made more sense; not that -mount is especially better

I suspect you're correct there, given that tools like ncdu use the short option -x

I think du -x comes from BSD's find -x, which was a later alias/replacement for -xdev. GNU du gave -x the long option name --one-file-system, which has also been borrowed into GNU cp, GNU rm, GNU tar, BSD tar, and rsync (... and ncdu, apparently)

BurntSushi commented 6 years ago

@okdana Ah nice, thanks! The prevalence of --one-file-system makes a compelling argument for that name. I guess we should probably go with that? It's also seemingly the most accurate. The only downside is that it's kind of long.

ssokolow commented 6 years ago

Another thing which just occurred to me is that I like the x-based names (-x, -xdev, etc.) because, in addition to "crossing device boundaries", they also associate it with options like --exclude in my mind, which is technically accurate. (You're eXcluding stuff outside the current filesystem.)

wavexx commented 5 years ago

I know it's late to comment on this, but if I explicitly pass two directories on different devices to perform searches on:

$ rg pat /dev1/ /dev2/

I do expect rg to scan/descend on both, as these were given explicitly. In this sense, I really preferred the original term "same device" or "do not cross boundaries" (aka xdev) more than 'ag --one-device' as I originally pointed out, as "one" in this context is just a special case.

This is just for the sake of discussion, I know the name has already been settled.