BurntSushi / ripgrep

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
The Unlicense
48.22k stars 1.98k forks source link

How to **whitelist** specific file extensions for listing with '--files' #333

Closed n00bmind closed 6 years ago

n00bmind commented 7 years ago

Hi. I'm using ripgrep primarily inside vim/ctrlp. I'm currently working on a very big repository (tens of thousands of files) with a, let's say "relaxed" policy about what should be in version control and what should not, meaning I have to filter out a lot of binaries, build byproducts and whatnot. I'm of course using a .ignore file at the root, which helps with speeding up the listings, but I need an additional layer of filtering to weed out so much "noise", so I'm trying to find an adequate syntax to use for the 'ctrlp_user_command' variable in vim which would whitelist just what I'm interested in (only certain types of source files). So far, I've obtained the best results by using the 'type-add' switch using the curly braces notation to include several file extensions, like this: rg . --files --type-add "source:*.{h,cs,c}" -tsource

This almost works, I can see most of the files listed have the extensions I specified, but there's also many unwanted files, like these two for instance: tools\webscarab\src\org\owasp\webscarab\plugin\scripted\script.bsh tools\win32\python27\Lib\site-packages\data\themes\tools\icons48.code.tga

For the first one I have no explanation. The second one would seem as if the '.code' part at the end was matching my '.c'.. Wild speculation, of course.

Any ideas?

BurntSushi commented 7 years ago

What happens if you try:

rg . --files --type-add "source:*.{h,cs,c}" -tsource --debug

and

$ rg . --files -g '*.{h,cs,c}' --debug
BurntSushi commented 7 years ago

Note that the --debug flag might cause a lot of output to stderr. If you can stick that in a gist or a pastebin, that would be great.

n00bmind commented 7 years ago

I don't feel too comfortable sharing details about this particular project, so I'll try to find a suitable example and provide the info you requested..

n00bmind commented 7 years ago

Ok, I'm using the Android SDK for this.. https://gist.github.com/chopsueysensei/bd0d7908f8d1b628cbb47b33a9b551b5

(I hope you can see all of it, it's pretty long) I can see that it whitelists several things like images, jars and other things. I'd say that paths that include a '.' somewhere are causing some trouble..

BurntSushi commented 7 years ago

@chopsueysensei Could you please include the command you ran? Could you also tell me how to clone the repo you're searching?

BurntSushi commented 7 years ago

I need enough information to reproduce the problem.

n00bmind commented 7 years ago

The commands are the ones you asked me to run. "rg_g_debug.txt" contains all the output from the command rg . --files -g '*.{h,cs,c}' --debug, while "rg_t_debug.txt" contains all the output from rg . --files --type-add "source:*.{h,cs,c}" -tsource --debug.

The tree is just my current Android SDK folder.

BurntSushi commented 7 years ago

The tree is just my current Android SDK folder.

Could you please tell me how to get it?

n00bmind commented 7 years ago

Just install Android SDK anywhere in your HD.. maybe also open "SDK Manager.exe" located in the root folder and download a couple platform versions / optional components to add some more content to it..

BurntSushi commented 7 years ago

@chocolateboy I'm not on Windows. I've never used the Android SDK before. Can you please link me to some instructions on how to acquire it? I need to be able to reproduce your problem.

n00bmind commented 7 years ago

https://developer.android.com/studio/index.html#downloads

However, if you're not on windows, you probably won't get the same output right?

n00bmind commented 7 years ago

Download the one under 'get just the command line tools'. It should be a matter of unzipping then running.

cheater commented 6 years ago

The android sdk is a red herring. You should be able to just create a file called foo.code.tga which is eg an ascii file with C inside it (for example) and it should be able to instruct rg to not find it.

cheater commented 6 years ago

(by C i mean C source of course)

imo the perfect resolution would be to add something like gnu find syntax for specifying file names. At least -path, -name, -ipath, and -iname as well as -not, -o, -a, -(, and -).

Or at least -ipath and -iname for starters.

okdana commented 6 years ago

* The pattern functionality is mostly as described here: https://git-scm.com/docs/gitignore

cheater commented 6 years ago

Then I guess the issue can be closed as resolved? With the caveat that the original reporter should be able to reopen it if they are unhappy with the features rg provides.

n00bmind commented 6 years ago

Is --glob the same as -g? In that case, I already tried that as commented in my original post, and it still had some issues. It almost worked, but some files gave false positives.. My intention when opening this was more in the direction of bug catching & fixing.. I since have not used vim again in large codebases, so cannot attest as to how rg behaves currently in that scenario.

BurntSushi commented 6 years ago

I am going to close this because it's not reproducible. If someone can come up with a contained example that uses something more accessible than the entire Android code base, then I can take a look and re-open this.

alper commented 3 years ago

Thte manpage says:

Only search files matching TYPE.

That does not help me to figure out what TYPE can be. I figured out in my case I have to say -tgo but that's fairly counterintuitive.

BurntSushi commented 3 years ago

@alper How is -tgo counter intuitive? Please consider reading the guide's section on filtering with file types.

alper commented 3 years ago

Oh cool. The guide has examples so that makes it a lot easier. The man page does not.

Values concatenated onto the flag is not something I see a lot? I would expect: -t go but that could be just me.

BurntSushi commented 3 years ago

Values concatenated onto the flag is not something I see a lot? I would expect: -t go but that could be just me.

It's standard and idiomatic in UNIX command line since... forever. I don't know precisely when the convention started, but it is specified by POSIX, which likely suggests the convention pre-dated POSIX. So it's probably been around for at least 32 years.

And -t go works as well.

disconnect3d commented 6 months ago

@BurntSushi Can we at least extend the "USAGE" displayed when rg is invoked with no arguments or with --help so that it shows the --type flag, like:

USAGE:
    rg [OPTIONS] PATTERN [PATH ...]
-    rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]
+    rg [OPTIONS] [--type TYPE ...] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]
    rg [OPTIONS] --files [PATH ...]
    rg [OPTIONS] --type-list
    command | rg [OPTIONS] PATTERN

Ideally, it would be nice to provide an example like --type markdown but yeah.

BurntSushi commented 6 months ago

That's not really what the usage is for. The usage is show the forms of allowable commands, not just prominent flags. Notice how the second form indicates that the only positional arguments are file paths, where as the first form indicates that the first positional argument is a pattern.

disconnect3d commented 6 months ago

That's not really what the usage is for. (...)

Yet, the usage/help fails to immediately show/explain to the user how to filter by filepaths/extensions.

I am pretty sure that many many people had the same issue and were annoyed that the usage doesn't show/explain -g or -t, but oh well.

It would be nice to address this somehow.

EDIT: I mean, sure, its in --help, but I still feel its hard to discover it. Random thought: ppl may not search for 'glob' (or know what it does) and instead look for 'filepath', 'file extension' etc.

BurntSushi commented 6 months ago

I am pretty sure that many many people had the same issue and were annoyed that the usage doesn't show/explain -g or -t, but oh well.

There are a ton of things it doesn't show.

The --help page is very long, and if you start prioritizing things to fix issues like, then you end up with the opposite problem: things that really do need to be prioritized end up getting de-prioritized. Therefore, your suggestion is not just "please prioritize this important feature," but it's also "please also de-prioritize this other thing at the same time." In other words, a balance must be struct.

The user guide has several prominent sections on filtering. I think that's probably good enough IMO.

cheater commented 6 months ago

As long as I can google for "ripgrep use type option" and there's something exactly about what I need in the first 3 hits, I'm fine. Currently nothing on the front page seems to be relevant. Obvious caveats about search bubbles notwithstanding.

On Tue, Mar 26, 2024 at 6:03 PM Andrew Gallant @.***> wrote:

I am pretty sure that many many people had the same issue and were annoyed that the usage doesn't show/explain -g or -t, but oh well.

There are a ton of things it doesn't show.

The --help page is very long, and if you start prioritizing things to fix issues like, then you end up with the opposite problem: things that really do need to be prioritized end up getting de-prioritized. Therefore, your suggestion is not just "please prioritize this important feature," but it's also "please also de-prioritize this other thing at the same time." In other words, a balance must be struct.

The user guide has several prominent sections on filtering. I think that's probably good enough IMO.

— Reply to this email directly, view it on GitHub https://github.com/BurntSushi/ripgrep/issues/333#issuecomment-2020997509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPWPURD2TIR3INRLOXJUTY2GS5RAVCNFSM4C43UCJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGA4TSNZVGA4Q . You are receiving this because you commented.Message ID: @.***>

BurntSushi commented 6 months ago

The first result for me is the GUIDE. And the GUIDE talks extensively about filtering. I don't understand how that isn't relevant. It is literally exactly the thing you would want to see.

cheater commented 6 months ago

it's the #1 result for me too. the search result snippet doesn't talk about it. note i said "seems to be relevant"

On Tue, Mar 26, 2024 at 7:03 PM Andrew Gallant @.***> wrote:

The first result for me is the GUIDE https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md. And the GUIDE talks extensively about filtering. I don't understand how that isn't relevant. It is literally exactly the thing you would want to see.

— Reply to this email directly, view it on GitHub https://github.com/BurntSushi/ripgrep/issues/333#issuecomment-2021140174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPWPSM7IVS3WFLNJZXH33Y2GZ7RAVCNFSM4C43UCJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGEYTIMBRG42A . You are receiving this because you commented.Message ID: @.***>

cheater commented 6 months ago

not just for that reason alone i'd suggest splitting that massive document up into smaller ones

On Tue, Mar 26, 2024 at 7:44 PM Damian @.***> wrote:

it's the #1 result for me too. the search result snippet doesn't talk about it. note i said "seems to be relevant"

On Tue, Mar 26, 2024 at 7:03 PM Andrew Gallant @.***> wrote:

The first result for me is the GUIDE https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md. And the GUIDE talks extensively about filtering. I don't understand how that isn't relevant. It is literally exactly the thing you would want to see.

— Reply to this email directly, view it on GitHub https://github.com/BurntSushi/ripgrep/issues/333#issuecomment-2021140174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPWPSM7IVS3WFLNJZXH33Y2GZ7RAVCNFSM4C43UCJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGEYTIMBRG42A . You are receiving this because you commented.Message ID: @.***>

BurntSushi commented 6 months ago

OK, well I don't control what snippet the search engine shows you.

I'm not splitting the guide into arbitrarily small pieces just so search engine results snippets are better.

The size of the document is large, which is why there is a table of contents. The table of contents is quick to scan and even includes the phrase "file types."

cheater commented 6 months ago

It's not just that. Long documents are overwhelming to people, especially ones with reading and focus disabilities, which means most people nowadays. It's really worthwhile to split this up into small pieces that just talk about one thing. For most of the things talked about in this document there isn't a really good reason to have them all in a single document.

On Tue, Mar 26, 2024 at 7:53 PM Andrew Gallant @.***> wrote:

OK, well I don't control what snippet the search engine shows you.

I'm not splitting the guide into arbitrarily small pieces just so search engine results snippets are better.

The size of the document is large, which is why there is a table of contents.

— Reply to this email directly, view it on GitHub https://github.com/BurntSushi/ripgrep/issues/333#issuecomment-2021236395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPWPTAQLL57DXOYVSPFSTY2G72FAVCNFSM4C43UCJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGEZDGNRTHE2Q . You are receiving this because you commented.Message ID: @.***>

BurntSushi commented 6 months ago

I think we'll have to agree to disagree.