Can't search across lines with .* regex

izuzak commented 10 years ago

Originally reported by @Cydrobolt over at https://github.com/atom/atom/issues/3892

Both regex and normal find can't search across lines. E.g


something="ex
ists"

<p>blablablabla
blablablabla</p>

For the second example, a regex of .* should have matched the text. However, it does not work, because it is spread across two lines.

burabure commented 10 years ago

if you try to do someting like

<p>(.*|\n)*?</p>

on the current buffer, it actually crashes (at least if the content is more than a couple lines long)

Ubuntu 14.04 atom 0.141.0

redfellow commented 9 years ago

Ubuntu 14.04 crash still exists.

harai commented 9 years ago

It still crashes.

Ubuntu 15.04
Atom 1.0.0
find-and-replace 0.174.1

dead-claudia commented 9 years ago

To be honest, I don't think that crash is easily fixable. Try running that regex through grep and see what happens. If it doesn't hang on a file of about 30 lines, then there is likely a very difficult perf bug in V8 or (highly unlikely) Atom's text editor. I would be surprised if that's the case, though, considering that is literally "any number of a group of the least number of characters consisting of either a newline or the largest group of non-line-ending characters you can get". That's a lot of work to do, and it's not the easiest to even statically compile that regex to infer that it's matching any set of characters that don't include any line-ending character other than a line feed. That means the regex doesn't match other line-breaking characters, i.e. carriage return, (the obscure line-breaking code points) U+2028 and U+2029, etc. Another thing is that even Sublime, etc. tend to choke a little on regular expressions.

Regular expression engines are extremely slow to begin with, and V8's Irregexp engine is one of the few that isn't atrociously slow. (It's faster than most POSIX-based regex implementations, and it's faster than Perl's highly optimized, highly flexible one.)

I would say one way, probably the best way, to curb the crashes is to instate a delay since the last character is added before the regex is finally executed, even as little as 200 milliseconds. I couldn't tell you how many times I've had Atom crash in the middle of me typing out a regex, simply because the incomplete one happened to match a third of the code. The other thing is that most editors don't regular expressions as they're typed - they run via a dialog or similar. Atom is rather unique in this problem.

acusti commented 9 years ago

I don’t know if this is worth it’s own issue, but the general problem of not being able to do a multiline search without converting your search to a regular expression is a painful one. It seems like the need to use “replace in project” to modify every instance of a multiline chunk of code in a project is the kind of thing that comes up frequently enough that it would be great if the editor could handle it. There have been a few times that I just wanted to paste in a code chunk to the find field and a different one in the replace field.

dbolton commented 8 years ago

@burabure If you delete the asterisk inside the parentheses, you'll get the same matches without the crash ((.|\n)*?). Better yet is the following regular expression which matches newlines regardless of platform (e.g. carriage return or newline)

[\s\S]*?

\s matches white space including line breaks, and \S matches anything that is not a white space. Unlike some languages, JavaScript doesn't have a way to flag that you want dots to match a newline. So maybe Atom can replace dots with [\s\S] under the hood to match newlines.

guillochon commented 8 years ago

This bug is really bad, I typed in (.*|\n)*? into my find and it crashed Atom, but find still has that pattern entered so it crashes every time I launch a new search now! How do I clear the search history?

Edit: Looks like it's working now after restarting Atom a few times, not sure what changed.

menocomp commented 8 years ago

@dbolton (.|\n)*? only works in one file!!! I tried it in folders and did not work!

dead-claudia commented 8 years ago

Should there be a multiline option? I think that's probably the best resolution, since there may be cases when you don't intend to match across lines.

winstliu commented 7 years ago

VSCode and Atom are two different projects, so both issues should remain open.

sekmo commented 7 years ago

No news after three years? :-)

steviesama commented 7 years ago

I don't know if it's a complete solution but I got something working. Pretty strange I thought, and I'm not sure about the limitations because it's hackity, but here's how I refactored a chunk of code I had in more places than I should have.

The first snip shows what I was matching as it always matches in the same window but never across multiple files. While if you hit enter how I have it here it will match across all instances of the text. example

Below is a snip of the search matching in all 40 places. example

This is pretty strange. But I noticed that the first line is always fine. Then to get to the next as well as every line thereafter, you need to start doing a pattern, at least the way I'm doing it. Shown below:

\s*[text to match]*

\s* for all the upcoming space, though I should mention, I did (\s)* or (\s*) in mine as what I wanted to also do was match whatever indentation was present. Putting your text to search inside a character class, and always terminating it with *, and your search will be found.

I found it strange than the character class worked, but I figured it had something to do with how it was finding them so I tried * after each character on lines after the first...and that worked too. Snip below.

var style = \{*\s*w*i*d*t*h* example

Well, I hope that was helpful to someone. I was about to use sub-grouping to change the followup matching and everything without a hitch.

steviesama commented 7 years ago

@isiahmeadows As for the multiline option, since it doesn't let you search across multiple lines, I think with what I found above, that seems to basically make multiline an explicit option.

dead-claudia commented 7 years ago

@steviesama Good point. Maybe better to add an option to, short-term, transform . to [^], and long-term, use /s (which is currently an ES proposal, but V8 has recently started shipping it by default).

dead-claudia commented 7 years ago

And maybe make that option ". matches newlines" or something like that.

ghost commented 6 years ago

What's the status on this?

winstliu commented 6 years ago

I'm not aware of any attempts to fix this issue, however we would be interested in reviewing PRs addressing this issue that don't regress in terms of performance. The current library we use for searching files is atom/scandal, where I believe files were intentionally broken up into chunks to improve search performance.

dead-claudia commented 6 years ago

Found an issue there, but no PR.

g3ar commented 6 years ago

Have the same issue. We need to have "Multiline" find option.

artheus commented 6 years ago

👍 I Agree that this is something that is needed.

jinglesthula commented 6 years ago

Although this has been painful enough for long enough, I think we're nearly out of the woods. The proposal went to Stage 4 seven months ago, and the kangax tables list it as an ES2018 feature http://kangax.github.io/compat-table/es2016plus/. I don't know the guts of Atom to know even what JS engine it's running or what ES features are supported, but I suspect we're either at the point (or will be very soon) where we could just have a button added on the Atom find UI to include the s flag.

dead-claudia commented 6 years ago

@jinglesthula Not sure how much the /s flag would help, if my analysis is correct.

jinglesthula commented 6 years ago

Mmm.. yeah. We're all probably naively thinking "how hard could it be?", but the realities of performance and scaling when dealing with large files isn't trivial. I wonder if other editors' approaches could be looked at to see how they accomplish it. For now, remembering to use [^] or \s* may be the easiest workaround.

dead-claudia commented 6 years ago

@jinglesthula This very issue has prompted me to start an ESDiscuss thread about what would be required to fix this.

But most certainly, the more intuitively simple something is conceptually, the more complex it really becomes behind the scenes to do correctly, ironically enough.

ghost commented 5 years ago

I know It has been a long time, but I've been trying a solution for this issue for a while. So, here are my 2¢:

blablablabla blablablabla

Find: (.*\n.+?) Replace: New content:$1

Result:

New content:blablablabla blablablabla

Screenshots

Before "Replace":

After "Replace":

Atom: 1.35.1 x64, macOS Mojave 10.14.4

Does it help?

g3ar commented 5 years ago

No.

ghost commented 5 years ago

@g3ar Could you give more details, please?

g3ar commented 5 years ago

I'm not using atom right now. Your solution works for simple files. I have tried this for complicated sources and it fails. I think problem is in wrong parsing of \n.

ghost commented 5 years ago

@g3ar I understand. I've tested it in a file (html+javascript+json) with 14,448 lines and it worked fine. However, I'm using Atom. I believe that different regex flavors require different regex structures.

I don't know if you already did it but, if not, you could try to identify which flavor/engine you're using and then try another solution.

Here's a list of them: https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

Good luck and thank you for the details.

DigitalLeaves commented 4 years ago

My two cents. It works for single files, but not for multifiles.

I have plenty of files with this code (sidebar, HTML static):

<li class="nav-item">
   <a class="nav-link" href="./employees.html">
      <i class="ni ni-badge text-primary"></i>
      <span class="nav-link-text" data-i18n="employees_and_salaries"></span>
   </a>
</li>

I want to add a new class (let's call it newclass) to the <li> element, but only when the link links to employees.html, so my regexp: <li class="nav-item">([.|\n|\s|\t]*)<a class="nav-link" href="\.\/employees\.html"> And replacement: <li class="nav-item newclass">$1<a class="nav-link" href="./employees.html">

Works for single files (finds the expression), but fails to find a single match if I look for multi-files (Shift+Option+F).

svennd commented 2 years ago

the "find all" works fine in a single file, but multi-file doesn't work. Is there a workaround available ? (other then opening 100's of files to run this manually) ?

I want to remove double lines :

thumbnail:(.|\r?\n)*?thumbnail:(.*?)$

with 

thumbnail:$2

atom / find-and-replace

Can't search across lines with .* regex #303

Atom: 1.35.1 x64, macOS Mojave 10.14.4