beyondgrep / ack3

ack is a grep-like search tool optimized for source code.
https://beyondgrep.com/
Other
696 stars 66 forks source link

Add an option to show first N lines (--head=N) #359

Open alexm opened 2 years ago

alexm commented 2 years ago

A common use of ack is filtering the output of commands like ps, kubectl, etc. that show column names on the first line:

$ ps -a
    PID TTY          TIME CMD
   8079 tty2     00:02:53 Xorg
   8256 tty2     00:00:00 gnome-session-b
  22672 pts/0    00:00:00 ps

Sometimes it would be useful to show the column names (i.e. usually the first line is enough) when the output meaning is not obvious. This can be easily achieved by adding a match for the first line, e.g.:

$ ps -a | ack '^\s*PID|gnome'
    PID TTY          TIME CMD
   8256 tty2     00:00:00 gnome-session-b

However, an option to always show the first line would be very useful, e.g.

petdance commented 2 years ago

This falls outside of what ack is intended to do. ack doesn't know anything about the text it's searching, and parsing that would certainly be the case. Ignore this, I thought you were talking about parsing the heading line.

For something like this, why even use ack vs. grep? There are no advantages to ack over grep, other than the Perl regexes.

If all you really want is the headings from ps, you could call ps twice and use head to get the headings the first time, and then grep your results from the second.

$ ps -a | head -n1 ; ps -a | grep cronolog
  PID TTY          TIME CMD
 8239 pts/14   00:00:00 cronolog
23339 pts/3    00:00:00 cronolog
31527 pts/2    00:00:00 cronolog
alexm commented 2 years ago

I'm not sure why ack should need to know anything about the text to always show the first line, i.e. print the first line (whatever it contains) and then proceed to filter from line 2 onwards.

Since ack has so many more features than grep, I thought that this could be nice to have too, but I understand your reluctance to add this feature.

Cheers, and thanks for your excellent work!

petdance commented 2 years ago

Hmm, I think I've been mixing up two things. Let's explore this idea.

petdance commented 2 years ago

So say we have a call like

ack foo --head=1

That means that ack will show the first line that matches, plus any lines that match /foo/. Questions:

petdance commented 2 years ago

Also, back to my original question: Why use ack at all for this? Why not do:

ps -a | head -n1 ; ps -a | grep cronolog
n1vux commented 2 years ago

Why use ack at all for this?

because i want Perl RE not egrep RE ?

( I use ps -a | perl -nlE 'say if 1..1 or /cronolog/ for such cases but that's me.)

petdance commented 2 years ago

because i want Perl RE not egrep RE ?

I know why you might, but I was asking @alexm.

alexm commented 2 years ago

Do we show the first N lines only if there is a match in the file?

Yes, only if there's a match in the file.

Do we show matches in the header lines? If "foo" shows up in the header lines, what do we do?

Nothing, --head=1 would mean that the first N lines are skipped from filtering as would happen with ps -a | head -n1 ; ps -a | ack regex, thus making --head=1 a convenient shortcut of that snippet.

Do we do the --head=1 rule on every file? The scenario you're describing is good for piped-in data, but ack does more than that.

Yes. For instance (see the notes below):

$ ps -a > ps.txt
$ kubectl get pods > kubectl.txt
$ ack --head=1 foobar *.txt
ps.txt
------
    PID TTY          TIME CMD
   6727 tty2     00:00:51 foobar

kubectl.txt
-----------
NAME                              READY   STATUS      RESTARTS   AGE
foobar-backend-7d8965b74b-wx76t   1/1     Running     0          2d20h

Notes:

Also, back to my original question: Why use ack at all for this?

I use ack more often that grep, even grep has now the -P option that let's you use Perl regexes. I prefer ack for several reasons:

petdance commented 2 years ago

I get all those reasons for using ack over grep (I've preached them :-)) I was just meaning in the case of filtering output from ps.

alexm commented 2 years ago

In ps -a | head -n1 ; ps -a | grep expr the output from both ps commands could theoretically be different.

Then, I imagined myself adding a new option to ack (which would be easier for me than doing it for grep). I'm even willing to send a pull request if I find enough round tuits.

Other than these, I can't say there's any other particular reason to prefer ack over grep to filter ps output.

petdance commented 2 years ago

The lines of dashes you've shown would be a new feature, right?

Right now if you don't want the grouping/line numbers, you have the -h. We don't yet have an option to just turn off line numbers, although it's a feature request that has been around a while and I wouldn't be opposed to. See #142

I wouldn't want --head=1 to change any behaviors on how things get output. If you were doing an ack of multiple files and you wanted --head=1, you would probably also have to have a --no-line-number argument as well.

Do we show matches in the header lines? If "foo" shows up in the header lines, what do we do?

Nothing, --head=1 would mean that the first N lines are skipped from filtering

When you say "skipped from filtering" here, do you mean "skipped from being searched"?

If so, then I'm not sure I'm OK with that, but will think. If not, please say more about what you mean by "filtering"?

(And thanks for taking the time to work through these questions. This is the tough part of figuring out features.)

petdance commented 2 years ago

Some things I'm thinking about: You're talking about using this to show the first line of a stream because you know it's a command like ps and you want the headings. I see the use of this being broader than just that.

For example, I might go acking through a tree of source and do

ack salestax src/ --head=5

because it's helpful to see the first 5 lines of the file that I'm getting results for, even though they aren't a "heading" like in the ps example. Maybe my results look like:

whatever.py
1: # whatever.py
2: # This program does the dingdong doodle.
3: # Created by ....
4: ...
5: ...
78: salestax = calculate_tax()
168: print(salestax)

Having those first 5 lines helps give me context for the actual matches. And that said, I think that if "salestax" appears in the first 5 lines, then it should be highlighted like any other ack match.

Another thought: How does --head=N interact with --output?

alexm commented 2 years ago

( I use ps -a | perl -nlE 'say if 1..1 or /cronolog/ for such cases but that's me.)

Wow! Didn't know that trick and I like it a lot :smile:

I'm assuming that the if 1..1 uses $. implicitly, but I can't find where is this documented. Any pointers?

Thanks, @n1vux!

n1vux commented 2 years ago

@alexm , yes, the scalar Range Operator implicitly compares an integer against $. .
This goes back to the early days when Larry was blending the best of shell, libc, sed, and awk into one language, Perl 1 or 2ish, iirc.
Great for -e one-liners, a little too cryptic for a maintainable script and useless in a reusable module.

On the theory of making simple things simple and hard things easier, --head=9 is a good addition for ack .

(Perl Range Op is more flexible: either value can also be a RE /^start\b/i .. /^end\b/i or logical expression, a() .. b() meaning from first line where a() is true to first line where b() is true. And mix and match. The Range op in list context is DWIMish magic for list constants.)

https://perldoc.perl.org/perlop#Range-Operators

alexm commented 2 years ago

The lines of dashes you've shown would be a new feature, right?

Right, but it was just an example of what could be done to highlight the filename without breaking the column layout for commands like ps and the like.

I wouldn't want --head=1 to change any behaviors on how things get output. If you were doing an ack of multiple files and you wanted --head=1, you would probably also have to have a --no-line-number argument as well.

Makes sense. What I'm sensing is that --head=1 has its own place and that some other option --column-names (or whatever) could use --head=1 and -h et al. to achieve what I really was looking for in the beginning.

When you say "skipped from filtering" here, do you mean "skipped from being searched"?

Yes, I meant that, but after reading the case you made later about showing the first N lines of the source files that match a pattern, I guess it makes more sense to search there too.

alexm commented 2 years ago

because it's helpful to see the first 5 lines of the file that I'm getting results for, even though they aren't a "heading" like in the ps example

Agreed.

Having those first 5 lines helps give me context for the actual matches. And that said, I think that if "salestax" appears in the first 5 lines, then it should be highlighted like any other ack match.

I changed my mind, you're right.

Another thought: How does --head=N interact with --output?

Good question. My feeling is that when somebody combines both options is because they expect both to be performed. Otherwise, one of them should be removed. Taking the example of the first 5 lines to add context:

petdance commented 2 years ago

I just realized, maybe --output and --head should be mutually exclusive, and it solves that problem. If you're specifying your own output, then you probably don't want the --head option anyway.

alexm commented 2 years ago

I just realized, maybe --output and --head should be mutually exclusive, and it solves that problem. If you're specifying your own output, then you probably don't want the --head option anyway.

That was my first thought :smile:

Is there any other option that is mutually exclusive with --output? i.e. to be coherent regarding its intent.

petdance commented 2 years ago

Yes, many mutually exclusive options. See mutex_options function.

n1vux commented 2 years ago

I don't see a statement of default N, maybe i missed it skimming through. I would suggest --head without a specific N e.g. N=7 --head=7 should be N=1 , as that's the single most common depth of headers. (and of course --no-head is the default value.)

n1vux commented 2 years ago

Can we set flags in .arckrc for only certain file-types? I could see value in type=csv → head=1 as a personal option. I might even set it so, were it possible. (Getting a line of bad data instead of a header would provide a nasty, implicit warning when a CSV does NOT have a header line!) (it would be wrong as a drop-in-replacement for grep, of course. Gnu Grep 3.7 does not have this feature. yet.)