Closed wheany closed 5 years ago
Ack should already be ignoring minified JS files. What JS files is your ack finding?
Minified js was maybe a bad example because they usually have .min.js extension, but there are minified js files that don't have the extension. I think the Vaadin/GWT UI toolkit for one.
Also some tools produce minified html and xml files, which also cause problems with ack.
What happens when you grep these files?
I don't understand the question.
If I grep (or 'ack' or 'ack-grep', doesn't matter) these files and they have a match, I get a several screenfuls of text with a matches highlighted somewhere in the mess.
I don't understand the question.
Not a trick question. Just wanted to know what grep did. For the most part, I try to keep ack and grep behaving the same.
Well, in that case, both grep and ack fill the terminal with useless amounts of text. Depending on where I run the command, that can mean thousands of rows of scrollback, if I'm unlucky enough to have multiple minified files that match.
One possibility could be being able to define characters that work as line breaks depending of file format. E.g. if you find a match in a .js file with long lines, treat semicolons like linefeeds (for the purposes of -A -B and -C switches)
E.g. if you find a match in a .js file with long lines, treat semicolons like linefeeds (for the purposes of -A -B and -C switches)
That's a level of source awareness that we don't want to get into.
It wouldn't have to be language aware code, it could be an option, just like --type-set or --ignore-dir
This is probably only a problem with languages that can be minified in the first place, and those have to have some other statement separator, so it could be like --statement-separator or --record-separator or something similar.
A co-worker just IM'd me with same complaint about *.js
. Since not all minifiers follow .min.js
quasi-convention, there's much noise. grep
doesn't do any DWIM magic, we do, so expectations are higher of ack
.
;
delimited (?m:^|;)[^;]*
[^;]*(?m:;|$)
(which is equivalent aside from highlighting to ack --output '$1' "($precontex$pattern$postcontext)"
)added:
Another coworker suggests
precontext='(?m:^|;)[^;]{0,20}'; postcontext='[^;]{0,20}(?m:;|$)';
will emulate KWIC (but for indent when less than 20) and
bold='^[[7m'; unbold='^[[0m'
with
ack --output "\$1$bold\$2$unbold\$3" "($precontext)($pattern)($postcontext)"
will even highlight the workaround. Ugly but possible!
Does grep not have an option to trim lines? I'm not seeing one.
which grep ? IDK. If gnu grep has one it would be good to be compatible, but we can be better. (added: i don't see one https://www.gnu.org/software/grep/manual/grep.html#Output-Line-Prefix-Control nor in FreeBSD's FreeGrep)
Adding this kind of option gets things pretty ugly what with highlighting the matches etc.
It sounds to me that the solution should probably be more towards people excluding files to not search files they know they want to ignore. Truncating result lines so they don't explode your screen is just saying "Let's work hard to make things more palatable that we don't even want to see anyway."
Truncating result lines so they don't explode your screen is just saying "Let's work hard to make things more palatable that we don't even want to see anyway."
I agree they should say --perl
or --type=clojure
if that's what they mean, which ignores all JS whether we can tell it's .min.js
or not.
And if both the minified and full JS are in the tree, they should arrange .ackrc to ignore the minified directories. We detect .min.js
if that is in use. Adding ignore .js in .ackrc in dir containing minified only may help sometimes. If we consulted .gitignore it might help DWIM otherwise. Classifying files with average line length() > 1024
as binary might help. But allowing users to say ignore lines > 1024 or 256 or whatever is good too.
If only the minified is available -- e.g. not shipped, or compiled from Clojure -- and they want to see where the JS calls the back end, maybe specifically having asked for --type=js
or looking for all mentions of domain-specific word, seeing statements instead of lines would help them with minified files.
( Maybe that's setting $/
aka $INPUT_RECORD_SEPARATOR
$RS
? I don't think we support that ... nor can we ? Might require a preprocess filter co-routine to expand and give statement numbers as faux line numbers? That's less ugly and modular but still invasive. )
In some cases it's actually desirable to see results within minified js (assuming it's all that's available) to put back-end code into context of a front-end call, for instance. Having the option to truncate excessively long lines at a specified limit or otherwise provide limited contextual results would be flexible and useful in a number of common use cases, rather than just excluding the files outright.
I was (am?) a fan of the old KWOC/KWIC formats. (I say 'was' because who really needs a lineprinter corpus index (concordance) in the 21stCentury! But context index is still plausibly useful for text searching online.) That --output
lets me generate KWOC and nearly-KWIC thrills me. I don't think we need --kwic-....
options. Maybe I can write-up a KWOC/KWIC idiom or wrapper for documentation ...
What are KWOC/KWIC
?
On Tue, Mar 14, 2017 at 12:46 AM, Andy Lester notifications@github.com wrote:
What are KWOC/KWIC?
https://en.wikipedia.org/wiki/Key_Word_in_Context
-- Bill Ricker bill.n1vux@gmail.com https://www.linkedin.com/in/n1vux
Closed and moved to wiki. https://github.com/petdance/ack2/wiki/Feature-requests
Could you reconsider this? It's been an issue in ack
for 10+ years. I can't imagine anyone considers scrolling through pages of the following to be desired functionality, and it's a common occurrence in codebases these days.
The simple way to do it is to add an .ackrc
compatible option that consists of a boolean flag and/or a width max limit. You don't have to get fancy with the trimming: put the matches in the center of the buffer when the line exceeds the width. This gives context on both sides and it's OK if the default buffer width results in a few lines of visual output (instead of thousands).
It's been an issue in ack for 10+ years.
It's been an issue with grep
since the beginning of time.
I don't understand what you mean by "put the matches in the center of the buffer when the line exceeds the width."
line: the long matched line in a file
match: the text that is highlighted (substring of line)
buffer: the truncated line storage
The issue with truncating lines is that it isn't clear how to display a partial line as opposed to a full line. By making the buffer at least as large as the match, then you can find the middle of the buffer by dividing the buffer length by two (and the middle of the match by dividing its length by two), and then you can put the middle of the match in the middle of the buffer. This gives equal context on either side of the match. This is a simple and good enough way to do it, though it is not the only way.
(When the line beginning or line end would be present in the buffer, then left or right align instead, in turn.)
Let's not get bogged down in the details of how it would be implemented internally, and keep it to the user interfece.
It sounds like you're suggesting that in the case of overrunning --maxwidth
that ack print out some portion of the line that has the match on it, right? Something like this?
47: ... stuff that is from the middle of the line **MATCHED TEXT** more but not to the end...
How do we handle multiple matches per line? What if acked on a comma and there are 1000 matches on the line in your minified javascript?
How do we handle lines that are longer than --maxwidth
that show up in the context lines when using -A
, -B
and -C
?
I have ideas for output that I don't want to put out here yet, but I don't see a way to handle the two scenarios above and still display matches.
It sounds like you're suggesting that in the case of overrunning --maxwidth that ack print out some portion of the line that has the match on it, right? Something like this?
Yes
I think the way to think about an option like --maxwidth
is as a UI nicety instead of as something that plays nicely with rigorous output parsing scripts. The way I (and I presume from the Google hits, a lot of other people) use ack
is as a nicer grep: I want to know what files trigger, primarily, and then secondarily it's nice to see what line numbers and what context, but ultimately, if it looks good, I'm going to open that file in my editor and jump to the matched keyword.
So for
How do we handle multiple matches per line? What if acked on a comma and there are 1000 matches on the line in your minified javascript?
my answer would be: we could highlight/include only those matches that fit in the buffer starting with the first match. Now the objection to this is that you drop some valid matches, but for the stated use case, this is OK---it's only not OK if you're doing some kind of piped scripting or something.
One way to signal this to the user is to make the name obviously not script-friendly, e.g. --pretty-maxwidth
or somesuch. Another way to make it play nicely with scripts is, possibly, to detect when you're outputting to a terminal and only do it then, say --terminal-output-maxwidth
, and this functionality is, I think, already built-in for the coloring.
For -A
, -B
, -C
I think the answer is to truncate the other lines as well, and that it would be fine to left-align and right-truncate, though I'm not as experienced with these options.
For
-A
,-B
,-C
I think the answer is to truncate the other lines as well, and that it would be fine to left-align and right-truncate, though I'm not as experienced with these options.
Realistically, the default buffer widths that people are going to use will be orders of magnitude smaller than the largest untruncated line, meaning that if you're displaying untruncated lines there, it's not like the output is going to be readable anyway.
Is it really meaningful to show the match on the line? Vs. just saying "There are 14 matches in this 47,320 character line", for example?
For me, yes. I use the context to determine whether it was a "desirable" match or not. The group of people I've known over the years that use ack
is fairly large and all developers, and the typical use case is "someone mentioned such and such string, or it came up somehow in my work, and now I need to know where all instances of 'isInitialBlankNavigation
' occur in the codebase." I'm not going to be interested in docs (say), or matches where it's a substring of a longer word I'm not interested in, etc., and that filtering happens visually in the terminal.
So what does the sample output look like? How do we denote it's a partial line?
If this is our normal output:
t/illegal-regex.t
33-
34: return subtest "test_ack_with( $testcase: @args )" => sub {
35- my ( $stdout, $stderr ) = run_ack_with_stderr( @args );
maybe the partials look like
t/illegal-regex.t
33-
34* ... whatever this that other subtest this other thing that goes very ....
35- my ( $stdout, $stderr ) = run_ack_with_stderr( @args );
With actual ...
at the front and the end, and a *
instead of :
as the divider against the numbers.
Kevin,
I'm planning to include in Ack3's cookbook section hints on displaying selective context and may even get Andy to include features to do better at it too.
Workarounds for context in Ack 2 for KWIC/KWOC Keyword indexes (with short input lines):
-
ack2 --output can sorta do KWOC/KWIC with evil before/after vars
-
--output '$&^I$'"'"'^I|| $`' # *KWOC*
-
--output '$`^I$&^I$'"'" # *pseudo KWIC*
-
but they’re nasty from Shell since mix quote and dollar
-
and tabs don’t truly line up if width variation exceeds a tab
width
For your purpose, monsterline uglified JS/html/etc, I'll make a long line version of ack-standalone and ack for perl 'use' statements.
perl -pE 's/\n$//' ack-standalone | ack2 --output '$1 $2 $3' '(.{0,20})(\buse \w+(?:::\w+)*[^;]{0,40};?)(.{0,20})' | less
( Those are ^V^I tabs )
Note that it steps over 'use warnings;' when it immediately follows 'use strict' which may be ok because it finds each cluster. But can get each this way
perl -pE 's/\n$//' ack-standalone | ack2 --output '$1 $2 $3' '(.{0,20})(\buse \w+(?:::\w+)*[^;]{0,40};?)((?=\buse)|.{0,20})' | less
Bill
On Mon, Apr 24, 2017 at 9:24 PM, Kevin Lawler notifications@github.com wrote:
For me, yes. I use the context to determine whether it was a "desirable" match or not. The group of people I've known over the years that use ack is fairly large and all developers, and the typical use case is "someone mentioned such and such string, or it came up somehow in my work, and now I need to know where all instances of 'isInitialBlankNavigation' occur in the codebase."
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petdance/ack2/issues/596#issuecomment-296865214, or mute the thread https://github.com/notifications/unsubscribe-auth/AANS-MTqvMHv5vXy-UohyvQMAVb3810Pks5rzUsxgaJpZM4H6lqz .
-- Bill Ricker bill.n1vux@gmail.com https://www.linkedin.com/in/n1vux
Use three dots on any side that's elided. Potentially color the dots. (Putting these "outside" is fine or you can put them inside and do the more complicated string math.) If you really want to get fancy you can put the number of dropped chars in brackets outside any elided side.
On Apr 24, 2017, at 6:46 PM, Andy Lester notifications@github.com wrote:
So what does the sample output look like? How do we denote it's a partial line?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
More than 2 years later, what's the status on this? I'm also having the same issue.
No more work is being done on ack2, but a related request came in the other day on ack3, and I think it might helpful here if it were implemented.
I welcome input on that ticket: https://github.com/beyondgrep/ack3/issues/234
I too would like some sort of feature to deal with suppressing or truncating matches from long lines. My workaround is to filter the output of ack using grep to remove results that are longer than 300 characters.
ack my-seach-string | grep -vE '.{300,}'
but because using ack with a pipe turns off color by default, I usually turn that back on with a flag:
ack --color my-seach-string | grep -vE '.{300,}'
It would be nice to be able to put something in my .ackrc
to ignore or truncate long lines by default that I could then override on the command line if I needed to.
@stephenostermiller Please go comment on the current ticket at https://github.com/beyondgrep/ack3/issues/325
If ack finds a match in e.g. minified js file, or other files with just one or few very long lines, it will flood the output with the contents of the file and likely push all useful matches off-screen.
I would like an option to either ignore such matches altogether (maybe list the file name), or possibly to only show some number of characters worth of context around the match.