Open izuzak opened 10 years ago
if you try to do someting like
<p>(.*|\n)*?</p>
on the current buffer, it actually crashes (at least if the content is more than a couple lines long)
Ubuntu 14.04 atom 0.141.0
Ubuntu 14.04 crash still exists.
It still crashes.
To be honest, I don't think that crash is easily fixable. Try running that regex through grep
and see what happens. If it doesn't hang on a file of about 30 lines, then there is likely a very difficult perf bug in V8 or (highly unlikely) Atom's text editor. I would be surprised if that's the case, though, considering that is literally "any number of a group of the least number of characters consisting of either a newline or the largest group of non-line-ending characters you can get". That's a lot of work to do, and it's not the easiest to even statically compile that regex to infer that it's matching any set of characters that don't include any line-ending character other than a line feed. That means the regex doesn't match other line-breaking characters, i.e. carriage return, (the obscure line-breaking code points) U+2028 and U+2029, etc. Another thing is that even Sublime, etc. tend to choke a little on regular expressions.
Regular expression engines are extremely slow to begin with, and V8's Irregexp engine is one of the few that isn't atrociously slow. (It's faster than most POSIX-based regex implementations, and it's faster than Perl's highly optimized, highly flexible one.)
I would say one way, probably the best way, to curb the crashes is to instate a delay since the last character is added before the regex is finally executed, even as little as 200 milliseconds. I couldn't tell you how many times I've had Atom crash in the middle of me typing out a regex, simply because the incomplete one happened to match a third of the code. The other thing is that most editors don't regular expressions as they're typed - they run via a dialog or similar. Atom is rather unique in this problem.
I don’t know if this is worth it’s own issue, but the general problem of not being able to do a multiline search without converting your search to a regular expression is a painful one. It seems like the need to use “replace in project” to modify every instance of a multiline chunk of code in a project is the kind of thing that comes up frequently enough that it would be great if the editor could handle it. There have been a few times that I just wanted to paste in a code chunk to the find field and a different one in the replace field.
@burabure If you delete the asterisk inside the parentheses, you'll get the same matches without the crash (<p>(.|\n)*?</p>
). Better yet is the following regular expression which matches newlines regardless of platform (e.g. carriage return or newline)
<p>[\s\S]*?</p>
\s
matches white space including line breaks, and \S
matches anything that is not a white space. Unlike some languages, JavaScript doesn't have a way to flag that you want dots to match a newline. So maybe Atom can replace dots with [\s\S]
under the hood to match newlines.
This bug is really bad, I typed in (.*|\n)*?
into my find and it crashed Atom, but find still has that pattern entered so it crashes every time I launch a new search now! How do I clear the search history?
Edit: Looks like it's working now after restarting Atom a few times, not sure what changed.
@dbolton <p>(.|\n)*?</p>
only works in one file!!! I tried it in folders and did not work!
Should there be a multiline
option? I think that's probably the best resolution, since there may be cases when you don't intend to match across lines.
VSCode and Atom are two different projects, so both issues should remain open.
No news after three years? :-)
I don't know if it's a complete solution but I got something working. Pretty strange I thought, and I'm not sure about the limitations because it's hackity, but here's how I refactored a chunk of code I had in more places than I should have.
The first snip shows what I was matching as it always matches in the same window but never across multiple files. While if you hit enter how I have it here it will match across all instances of the text.
Below is a snip of the search matching in all 40 places.
This is pretty strange. But I noticed that the first line is always fine. Then to get to the next as well as every line thereafter, you need to start doing a pattern, at least the way I'm doing it. Shown below:
\s*[text to match]*
\s*
for all the upcoming space, though I should mention, I did (\s)*
or (\s*)
in mine as what I wanted to also do was match whatever indentation was present. Putting your text to search inside a character class, and always terminating it with *
, and your search will be found.
I found it strange than the character class worked, but I figured it had something to do with how it was finding them so I tried *
after each character on lines after the first...and that worked too. Snip below.
var style = \{*\s*w*i*d*t*h*
Well, I hope that was helpful to someone. I was about to use sub-grouping to change the followup matching and everything without a hitch.
@isiahmeadows As for the multiline option, since it doesn't let you search across multiple lines, I think with what I found above, that seems to basically make multiline an explicit option.
@steviesama Good point. Maybe better to add an option to, short-term, transform .
to [^]
, and long-term, use /s
(which is currently an ES proposal, but V8 has recently started shipping it by default).
And maybe make that option ".
matches newlines" or something like that.
What's the status on this?
I'm not aware of any attempts to fix this issue, however we would be interested in reviewing PRs addressing this issue that don't regress in terms of performance. The current library we use for searching files is atom/scandal, where I believe files were intentionally broken up into chunks to improve search performance.
Found an issue there, but no PR.
Have the same issue. We need to have "Multiline" find option.
👍 I Agree that this is something that is needed.
Although this has been painful enough for long enough, I think we're nearly out of the woods. The proposal went to Stage 4 seven months ago, and the kangax tables list it as an ES2018 feature http://kangax.github.io/compat-table/es2016plus/. I don't know the guts of Atom to know even what JS engine it's running or what ES features are supported, but I suspect we're either at the point (or will be very soon) where we could just have a button added on the Atom find UI to include the s
flag.
Mmm.. yeah. We're all probably naively thinking "how hard could it be?", but the realities of performance and scaling when dealing with large files isn't trivial. I wonder if other editors' approaches could be looked at to see how they accomplish it. For now, remembering to use [^]
or \s*
may be the easiest workaround.
@jinglesthula This very issue has prompted me to start an ESDiscuss thread about what would be required to fix this.
But most certainly, the more intuitively simple something is conceptually, the more complex it really becomes behind the scenes to do correctly, ironically enough.
I know It has been a long time, but I've been trying a solution for this issue for a while. So, here are my 2¢:
<p>blablablabla
blablablabla</p>
Find: <p>(.*\n.+?)</p>
Replace: <p>New content:$1</p>
Result:
<p>New content:blablablabla
blablablabla</p>
Screenshots
Before "Replace":
After "Replace":
Does it help?
No.
@g3ar Could you give more details, please?
I'm not using atom right now. Your solution works for simple files. I have tried this for complicated sources and it fails. I think problem is in wrong parsing of \n
.
@g3ar I understand. I've tested it in a file (html+javascript+json) with 14,448 lines and it worked fine. However, I'm using Atom. I believe that different regex flavors require different regex structures.
I don't know if you already did it but, if not, you could try to identify which flavor/engine you're using and then try another solution.
Here's a list of them: https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
Good luck and thank you for the details.
My two cents. It works for single files, but not for multifiles.
I have plenty of files with this code (sidebar, HTML static):
<li class="nav-item">
<a class="nav-link" href="./employees.html">
<i class="ni ni-badge text-primary"></i>
<span class="nav-link-text" data-i18n="employees_and_salaries"></span>
</a>
</li>
I want to add a new class (let's call it newclass) to the <li> element, but only when the link links to employees.html, so my regexp:
<li class="nav-item">([.|\n|\s|\t]*)<a class="nav-link" href="\.\/employees\.html">
And replacement:
<li class="nav-item newclass">$1<a class="nav-link" href="./employees.html">
Works for single files (finds the expression), but fails to find a single match if I look for multi-files (Shift+Option+F).
the "find all" works fine in a single file, but multi-file doesn't work. Is there a workaround available ? (other then opening 100's of files to run this manually) ?
I want to remove double lines :
thumbnail:(.|\r?\n)*?thumbnail:(.*?)$
with
thumbnail:$2
Originally reported by @Cydrobolt over at https://github.com/atom/atom/issues/3892
Both regex and normal find can't search across lines. E.g
For the second example, a regex of
<p>.*</p>
should have matched the text. However, it does not work, because it is spread across two lines.