ggreer / the_silver_searcher

A code-searching tool similar to ack, but faster.
http://geoff.greer.fm/ag/
Apache License 2.0
26.16k stars 1.43k forks source link

ag stalls in the middle of search (but ack works) #462

Open scor opened 10 years ago

scor commented 10 years ago

I wanted to use ag to search for strings in https://www.drupal.org/sandbox/greggles/1481160. I installed and downloaded all the repositories according to the "Gotta Download Them All" tool instructions, and ran this command from inside the allmodules directory:

ag "[^a-zA-Z_]*xml_parse[ ]*\("

The output consistently stops at the same point:

activitystream/activitystream_feed/SimplePie.compiled.php:15600:                        if (!xml_parse($xml, $data, true))
anatoa/library/nusoap.php:1179:             if(!xml_parse($this->parser,$xml,true)){
anatoa/library/nusoap.php:4719:        if (!xml_parse($this->parser, $wsdl_string, true)) {
anatoa/library/nusoap.php:6457:                 if(!xml_parse($this->parser,$xml,true)){
...
emf/plugins/emf_campaign_monitor/CMBase.php:548:* safely use xml_parse() and other related functions (the alternative is to use
feed_field/feed_field.module:405:  if (!xml_parse($xml_parser, $data, 1)) {
filemakerform/includes/FX/FX.php:1432:                $xmlParseResult = xml_parse($xml_parser, $data, true);
filemakerform/includes/FX/FX.php:1464:                $xmlParseResult = xml_pa

The CPU remains at 100% but no more data is output.

ggreer commented 10 years ago

Can you give me a tarball of a directory that this breaks on? I started to try to reproduce this issue, but it meant going down a rabbit hole of dependencies installing drush, composer, drupal, and php.

scor commented 10 years ago

I've narrowed it down to this archive: http://ftp.drupal.org/files/projects/gm3-7.x-1.x-dev.tar.gz - it seems to come from the gm3_region/region_data/ directory which contains some binaries.

ggreer commented 10 years ago

The files in that directory are ASCII and UTF-8. Some lines in them are very long, which makes your regex have pathological performance. I'm pretty sure the [^a-zA-Z_]* bit is causing you trouble.