ggreer / the_silver_searcher

A code-searching tool similar to ack, but faster.
http://geoff.greer.fm/ag/
Apache License 2.0
26.18k stars 1.43k forks source link

ag cannot search files > 2Gb #975

Open ggl opened 8 years ago

ggl commented 8 years ago

If I try to search a file bigger than 2Gb, I get the follwing error: ERR: Skipping system.log: pcre_exec() can't handle files larger than 2147483647 bytes

Grep and ack both work fine (although for ack it takes like forever).

jschpp commented 8 years ago

This is by design. The regex Engine (PCRE) can't handle files that large. You can find here that the maximum subject (file) length is INT_MAX which is 2147483647 for a signend 32bit int. Therefore the maximum file size is INT_MAX in bytes.

ggl commented 8 years ago

Most uses of grep/ack/ag are line by line searches. You would only have to reach the maximum subject length on multiline searches or if a single line is over 2G. So for most other uses ag would only need to match a single line at a time.

jschpp commented 8 years ago

You are right in saying most searches are single line only. Nevertheless ag does multi-line searching by default (as far as I know it matches newlines with the \s regex). I found that files greater than 2GB can be searched with a literal (not regex) pattern. In theory ag could make a case-by-case decision and only raise that error in case of a single line greater INT_MAX bytes or multiline searching.

Maybe @ggreer could mention whether he wants this or not. I'm not sure how much work it would be to patch this to support the above case-by-case choice.

netheril96 commented 7 years ago

pcre has a new version called pcre2 with backwards incompatible new API. The new API uses size_t instead of int to refer to lengths, so it can handle strings larger than 2GB. Maybe ag should update to require pcre2 instead.

njt1982 commented 7 years ago

@jschpp how do you do literal patterns?

jschpp commented 7 years ago

@njt1982 From ag --help: -Q --literal Don't parse PATTERN as a regular expression

monperrus commented 7 years ago

I'm hit by this limitation today to grep my 7GB mailbox. It would be really great to handle large files.

jeffythedragonslayer commented 6 years ago

I hit these errors today when running ag from my home directory and then ag dumped core. Too bad I didn't have ulimit -c unlimited enabled

dimaqq commented 1 year ago

Is there maybe a change for a command line argument that would mean something like "process first 2GB of data, then give up"?

pdelteil commented 1 year ago

You could add a flag to split the file in 2GB parts and then merge the result of every run.