ggreer / the_silver_searcher

A code-searching tool similar to ack, but faster.
http://geoff.greer.fm/ag/
Apache License 2.0
26.07k stars 1.42k forks source link

ag doing multi-line matching by default #682

Open mat813 opened 9 years ago

mat813 commented 9 years ago

I was looking for file in the FreeBSD ports tree, containing github.com but that did not start with a W. So, I wrote:

# ag '^[^W].*github.com' 
archivers/hs-zlib-enum/pkg-descr
3:
4:WWW: http://github.com/maltem/zlib-enum

archivers/nwreckdum/pkg-descr
3:
4:WWW: https://github.com/danfe/nwreckdum
....

And it seems .* means "any char" including new line. Is there a way around that ?

decaff commented 9 years ago

Ag utilizes pcre(3) for regex matching, with the PCRE_MULTILINE feature enabled. In multiline mode, \n is just another character. Quoting from an online man page:

By default, for the purposes of matching "start of line" and "end of line", PCRE treats the subject string as consisting of a single line of characters, even if it actually contains newlines.....

When PCRE_MULTILINE is set, the "start of line" and "end of line" constructs match immediately following or immediately before internal newlines in the subject string, respectively, as well as at the very start and end.

Nothing unexpected there, but from the point of view of traditional line-at-a-time grep'ing, your Ag search results are surprising. Here's my theory of what's occurring:

@nodakai in #459 has the solution. Change the regex like so:

^[^W\n].*github.com

Looking thru #459, it can be seen that a negated character class caused issues there as well. The Ag man page probably needs to be amended to discuss the pitfalls of multiline matching and negated character classes.

decaff commented 9 years ago

A man page commit is now available that discusses multi-line matching.

ldong commented 9 years ago

I would love to see this multi-lines feature -M built-in to ag

amagura commented 5 years ago

There seems to be a --nomultiline option, but it does not appear to work as intended. It definitely does something (ran a diff against the output of ag with and without the --nomultiline option and there were differences), however .* still matches against newlines:

ag --nomultiline 'class="modal.*(@keydown.esc){0}' returns this:

<div id="addThingModal" class="modal" :class="{'d-block': showAddThingModal}" @keydown.esc.prevent="showAddThingModal = false">
  <div class="modal-dialog modal-lg" role="document">
    <div class="modal-content">
      <div class="modal-header d-flex">
        <h5 class="modal-title align-self-center">Add Thing</h5>
        <div class="modal-body">
          <div class="modal-footer">

When it should have returned this:

<div class="modal-dialog modal-lg" role="document">
    <div class="modal-content">
      <div class="modal-header d-flex">
        <h5 class="modal-title align-self-center">Add Thing</h5>
        <div class="modal-body">
          <div class="modal-footer">