ggreer / the_silver_searcher

A code-searching tool similar to ack, but faster.
http://geoff.greer.fm/ag/
Apache License 2.0
26.15k stars 1.43k forks source link

Matching whole words fails in some cases #563

Open dideler opened 9 years ago

dideler commented 9 years ago

ag finds these matches as expected:

↪ ag -w '\.deliver'
spec/mailers/notification_mailer_spec.rb
41:      spam_notification_email.deliver
100:      failure_email.deliver
152:      route_limit_email.deliver

I'm using word match so it doesn't match on strings like .deliveries. I then changed those instances of .deliver to .deliver_now.

Now ag doesn't find any more matches:

↪ ag -w '\.deliver'

But git grep finds remaining matches as expected:

↪ git grep -w '\.deliver'
app/controllers/webhooks/mailgun_controller.rb:      NotificationMailer.route_limit_notification(num_routes).deliver
app/controllers/webhooks/mailgun_controller.rb:    NotificationMailer.failed_delivery(params_and_message_headers).deliver
app/models/inbound_email.rb:      NotificationMailer.spam_notification(self).deliver

It seems that ag is failing to find matches when there's a ) before the pattern I'm searching for.

okdana commented 9 years ago

The -w option simply wraps your search pattern in \b (so \.deliver becomes \b\.deliver\b'). \b matches a word boundary, which is the 'space' between a word character (letters, numbers, and unders-cores) and a non-word character (anything that's not letters, numbers, and under-scores).

So ag -w '\.deliver' will never match something like (self).deliver, because the character to the left of the . is another non-word character (and therefore there is no word boundary).

This matches the behaviour of (e.g.) pcregrep, but it does not work like more traditional greps like git or GNU, both of which use some custom boundary-matching code rather than rely on the equivalent boundary character class.

Ack also uses \b, but you might say it's 'smarter' about it than ag — it only adds the surrounding \b if the first/last character in the search pattern is a word character (so if it's deliver it will add \b to both sides, but if it's \.deliver it will only add it to the end). That seems like a neat way to do it to me, although how convenient it is in C (vs Ack's two lines of Perl), i don't know.