Open ggl opened 8 years ago
This is by design. The regex Engine (PCRE) can't handle files that large.
You can find here that the maximum subject (file) length is INT_MAX
which is 2147483647 for a signend 32bit int. Therefore the maximum file size is INT_MAX
in bytes.
Most uses of grep/ack/ag are line by line searches. You would only have to reach the maximum subject length on multiline searches or if a single line is over 2G. So for most other uses ag would only need to match a single line at a time.
You are right in saying most searches are single line only. Nevertheless ag does multi-line searching by default (as far as I know it matches newlines with the \s
regex).
I found that files greater than 2GB can be searched with a literal (not regex) pattern. In theory ag could make a case-by-case decision and only raise that error in case of a single line greater INT_MAX
bytes or multiline searching.
Maybe @ggreer could mention whether he wants this or not. I'm not sure how much work it would be to patch this to support the above case-by-case choice.
pcre has a new version called pcre2 with backwards incompatible new API. The new API uses size_t
instead of int
to refer to lengths, so it can handle strings larger than 2GB. Maybe ag
should update to require pcre2 instead.
@jschpp how do you do literal patterns?
@njt1982 From ag --help
: -Q --literal Don't parse PATTERN as a regular expression
I'm hit by this limitation today to grep my 7GB mailbox. It would be really great to handle large files.
I hit these errors today when running ag from my home directory and then ag dumped core. Too bad I didn't have ulimit -c unlimited enabled
Is there maybe a change for a command line argument that would mean something like "process first 2GB of data, then give up"?
You could add a flag to split the file in 2GB parts and then merge the result of every run.
If I try to search a file bigger than 2Gb, I get the follwing error: ERR: Skipping system.log: pcre_exec() can't handle files larger than 2147483647 bytes
Grep and ack both work fine (although for ack it takes like forever).