No support for lookbehind

ghost commented 8 years ago

Implementing JavaScript's native RegExp engine for a find/replace tool in a text editor seems like a dramatic oversight.

As find-and-replace leans on JS's RegExp, of course the following find expressions is invalid

(?<=hello)world
(?<!hello)world

Errors

Invalid regular expression: /(?<=hello)world/: Invalid group
Invalid regular expression: /(?<!hello)world/: Invalid group

This module is unusable without such features

Epigene commented 8 years ago

I agree, lookaround is crucial!

Using JS's RegExp is also weird considering Oniguruma is already used within Atom to parse grammar files as mentioned in this exchange.

abe33 commented 8 years ago

I as already mentioned on Discuss, using oniguruma for grammar made sense as it allows to reuse TextMate and SublimeText grammars out of the box, but AFAIK that's the only context where oniguruma is used. OTOH I agree it's a shame that lookbehind is not part of JS regexp (while loohahead is).

jesseleite commented 8 years ago

I don't know much about the subject, apart from lookbehinds not being part of JS regex. However, as a PHP dev who regular-ily (pun intended) regexes via PHP regex engine with lookbehinds, it sucks not being able to lookbehind in Atom's find and replace. Curious about this Oniguruma?

alexchandel commented 8 years ago

Oniguruma has this feature and many, many more. There is absolutely no reason to use JS's garbage regex engine over Oniguruma.

jesseleite commented 8 years ago

@alexchandel I'm unfamiliar with Oniguruma. I'm all for more features, but is the syntax standard and testable on a site like regex101.com?

50Wliu commented 8 years ago

@JesseLeite It's what Atom uses for languages, and I've never had a problem testing any of those regexes on regex101.com.

blackbaud-jakespirek commented 8 years ago

Bummed that lookbehind doesn't work.

alexchandel commented 8 years ago

@bb-jakespirek It would if #698 happened

alexchandel commented 8 years ago

Since #698 is closed as a duplicate, is this issue now for using Oniguruma for regexes?

ghost commented 8 years ago

relevant: https://github.com/atom/find-and-replace/issues/570

Oniguruma would fix this too.

bric3 commented 8 years ago

@JesseLeite https://regex101.com/r/dO0yG2/1

It depends on the regex engine, but yes it's almost de facto standard

lzkelley commented 8 years ago

Would this also allow better handling of indentation errors (e.g. https://github.com/atom/language-python/issues/22)? Which really shouldn't need special packages to solve (e.g. https://atom.io/packages/python-indent)...

50Wliu commented 8 years ago

@lzkelley No, as find-and-replace has nothing to do with language grammars. Those are already using Oniguruma.

MattSturgeon commented 8 years ago

Those are already using Oniguruma.

Wait.. Atom already uses Oniguruma, but not for Find? Surely it should be a simple fix then? ;)

ErikCorryGoogle commented 7 years ago

Latest V8 releases have lookbehind available. It's behind the experimental JavaScript flag.

lee-dohm commented 7 years ago

@alexchandel your comment was deleted as a violation of the Atom Code of Conduct as it is insulting or derogatory. You may consider this an official warning.

MattSturgeon commented 7 years ago

For those interested, @alexchandel was saying that while V8 has added experimental lookbehind support, it is still inferior to many alternative regex libraries in his opinion:

V8 still doesn't have have async callbacks, and catastrophic backtracking and misspelt regexes are the primary source of hangs in find-and-replace, for example #557 and #856.

He then went on to suggest using an alternative like Oniguruma (and rather rudely implied that the atom team should be prioritising this issue higher — hence the deleted comment).

MartinBonner commented 7 years ago

I was trying to find the documentation of the Atom regex syntax (given that they are all different), and was pleased to find that it used Oniguruma (with the lookbehind I wanted), and then was gutted to discover this bug.

It does seem rather extraordinary that Atom uses Oniguruma, but not for actual search-and-replace!

Stanzilla commented 7 years ago

VS Code recently switched to ripgrep for search and it might be worth investigating doing the same in Atom. It is really fast. ⚡️

MartinBonner commented 7 years ago

ripgrep is great for a "find in files" solution, but it achieves that speed (at least in part) by not supporting certain functions (the other part is achieved by not looking in .gitignore'd files - which is also only relevant to "find in files". In particular, I think it doesn't support look-around (which is what I was after). I don't think raw speed is nearly so important for a complex search-and-replace (which is what I wanted this feature for).

ai-danno commented 7 years ago

As a regex nerd looking for the ultimate editor, I was drawn to this conversation.

Question- does Atom now support lookbehinds? Thanks in advance!

MartinBonner commented 7 years ago

@ai-danno : No it doesn't. The lack of lookbehind is what drew me to this thread in the first place.

My comment was because switching to ripgrep would give us speed, but not lookaround (and I think that would be a poor tradeoff).

ai-danno commented 7 years ago

@MartinBonner 👍 Thanks for the fast reply. It seems like it might bubble to the surface here in not too long, so I'll keep checking back.

ghost commented 7 years ago

I agree 100% JS's RegExp implementation is severely lacking. No lookbehind or reset is such a pain!

Doing a search like this is a MOST BASIC requirement of a code editor https://regex101.com/r/YjWwqp/1/

What will it take for Find/Replace to use Oniguruma instead of native JS RegExp. There's no point in waiting for new V8's lookbehind features. Oniguruma supports stuff like reset \K and it's working today, IN ATOM. It just needs to be USED for find/replace.

So who do we have to hassle or bribe to get this?

ai-danno commented 7 years ago

AS an end-user, lookbehind is the holy-grail of nerdy text-editing. Kate is free and has decent regex, but only lookahead. EditPadPro has great regex... for $50!!!

And I feel like I'm not the only one watching the capabilities of the various editors. To have this functionality in Atom, IMHO, would be a best-kept-secret advertisement for Atom. As the user, when you commit to a text-editor you do it fully, for the long-haul- you don't want to commit to a mostly-functional editor. Missing features like this become important in that decision-making process for the user.

IMHO, Atom is almost there. Also, it's encouraging to see discussion like this, makes me want to use it anyway.. keep up the good work!! :)

If my comments overstepped the bounds of this conversation or forum, I apologize.

50Wliu commented 7 years ago

@lope and others: Please feel free to submit a PR! The Atom team is primarily focusing on startup time and performance improvements at the moment, so we (currently) don't have the manpower to implement this, but we would definitely be open to a well-thought-out PR that switches to an alternative regex engine.

lee-dohm commented 7 years ago

As this issue stands it isn't actionable. Here's why:

There is no definition of "better"
There is disagreement on which features are "necessary"
There is no list of which features people require
There is no definition of performance expectations for both speed and memory (because I'm certain that will be the first complaint even if we select an engine that has all the features everyone wants)

Any work the maintainer team would do along these lines would ultimately be wasted effort. Even if someone submitted a PR using a different regular expression engine, we couldn't accept it because we don't have a meaningful, community accepted definition of what "done" looks like here. I refuse to sign the maintainer team up for trying to hit a moving target.

I recommend that everyone that is interested in this take the discussion to this Discuss topic and work out a definition of what a regular expression engine has to have to be acceptable. If a consensus can be reached, I will even create an Atom blog post to request further comments from the community to ensure that we are getting broad-based feedback and not leaving anyone out of the conversation. If that process results in consensus, then I would encourage someone to submit a PR that solves that problem and will work with the development team to get it accepted.

I want us to improve Atom and if we can be sure that the change won't just be making one group happy while antagonizing another, I'll stand behind it and advocate for it.

ghost commented 7 years ago

@lee-dohm if you are concerned that a better RegExp engine like Oniguruma would be slow enough to upset users who don't need the features it provides, then I think the ideal upgrade to Atom's RegExp find/replace would be an option to select which RegExp engine to use.

@lee-dohm I think it's great that you've raised the performance issue of RegExp engines. And it's great that you don't want to make a change that will upset some users and make other users happy.

I think it's wrong that we should all discuss what RegExp features we want and collectively compromise on features and performance. That is pointless. You can't please everyone that way. But by letting people select their RegExp engine, everyone can be happy 😄

Ideally clicking the .* find/replace icon, it could pop-up a select-box that disappears after moving the mouse away, defaulting to the last used RegExp engine. Similar to how you can select a RegExp engine on RegEx101.com.

So the fastest JS RegExp engine can be the default, and if they enter for example a lookbehind expression then it can let them know they're using the JS engine but they can switch to Oniguruma (or whatever) if they wish.

ghost commented 7 years ago

@50Wliu I respect the Atom team's preference to work on startup time and performance. But personally the startup time is plenty good enough for my liking. I don't reboot every 5 minutes so I don't care if atom opens in 1 second or 10 seconds. I've got a decent machine (dual core i7 2###, 4 threads) I've scripted the startup of my whole dev environment. I make coffee while it all loads up.

The only time I've run into performance problems with Atom is when as an experiment, I've pasted 250KB of base64 encoded file into my Node.js script because I was too lazy to slap it into a separate file, then fs.readFileSync it. But the fact that it doesn't handle massive non-code chunks of data doesn't bother me. I don't expect or need it to.

Spending time to write convoluted RegExp expressions regularly throughout the day, because there is no lookbehind in JS RegExp is a much greater issue for me.

MartinBonner commented 7 years ago

@lope Noooooo! Options are bad. Collective compromise is a much better solution. I don't have time to learn about all the different engines and which one provides the features/performance I need. (By all means have a fast engine which offers limited features and then automatically switch to a slower engine if a more complex RE is used.)

(RegEx101.com is a very different case.)

MartinBonner commented 7 years ago

@lope: There are obviously limits. In my view Atom is too far towards the "knobs and buttons" end of the spectrum - I just want an editor that works, not one I have to program my self (if I wanted that, I know where to find Emacs). Feel free to have the last word...

ghost commented 7 years ago

If you guys want a workaround to enable lookbehind right now in Atom, first close all atom windows, then run atom --js-flags="--harmony_regexp_lookbehind"

I see searching the entire project with a lookbehind is spectacularly slow. It's a pity the engine doesn't somehow rewrite or optimize the search. Slower than slow (?<!O)\.foo[^A-Za-z] (but it's quick if you only search in one file) Near instant [A-NP-Za-z]\.foo[^A-Za-z]

So that's my feedback, for what it's worth. Harmony on 1.14.3 (will upgrade now)

lee-dohm commented 7 years ago

@lope your comment was deleted as a violation of the Atom Code of Conduct as is insulting or derogatory. You may consider this an official warning.

lee-dohm commented 7 years ago

@lope This isn't up for debate. You have a couple options here:

You can work with others and compromise on what works best across all Atom users
You can fork the find-and-replace package and install whatever regular expression engine you want that works best for you

The Atom maintainer team is not going to be creating a pluggable architecture for regular expression engines. And we're going to continue to work on performance of various types including startup performance.

laike9m commented 7 years ago

NEED this feature!!! PLEASE!!!

lee-dohm commented 7 years ago

@laike9m Please read the call to action above if you'd like to help define what "better" means.

ghost commented 7 years ago

@lee-dohm From now on I won't express any disagreement or point out the flaws in people's logic because you interpret that as an insult or derogatory. So I'll just say have a nice day! Doesn't seem constructive, but you're the boss.

jeancroy commented 7 years ago

There is no definition of performance expectations for both speed and memory (because I'm certain that will be the first complaint even if we select an engine that has all the features everyone wants)

If you'd like to help define what "better" means.

As far as my issue is concerned (#557) better means prevent prolonged active hang (100% cpu) of the editor, regardless of user input. Regex are hard and it's easy to create exponential complexity mini program.

This criteria would be met by using JS engine in a worker thread, for example. Although that may require similar amount of work that the current Oniguruma request.

lee-dohm commented 7 years ago

@jeancroy You make a good point. I don't think fixing that problem needs to wait for people to decide what regular expression features are most important, especially since it is something that I believe all regex implementations are going to face. So I'd like to keep that as a separate issue.

MattSturgeon commented 7 years ago

It's my interpretation that the general consensus seems to be that "better" or "fixed" could simply mean use Oniguruma for all regex stuff in Atom.

I believe the two main arguments for this have been 1) Atom already uses Oniguruma elsewhere, so it makes sense to be consistent and 2) Oniguruma adds requested features (#571, #667, #698, etc).

I would propose that if Oniguruma is advanced enough (and fast enough) for other Atom components, then it is likely to be good enough for Find, too.

Does anyone know of any reasons not to use Oniguruma for find and replace - or any reasons why it wouldn't be suitable as a universal regex engine for Atom?

ErikCorryGoogle commented 7 years ago

Lope, I tested your regexps in plain V8 and I get the lookbehind version to be 2-3 times slower. That doesn't fit very well with your "instant" vs. "slower than slow", but it's not satisfying.

I have submitted a patch to fix the performance: https://codereview.chromium.org/2777583003

MartinBonner commented 7 years ago

@MattSturgeon The obvious reason not to use Oniguruma for find and replace is that it involves work to stop using the Javascript regex and use Oniguruma instead. In particular, if the Javascript regex is going to acquire the requested features anyway, it may be better to just wait for that to happen.

I'm not convinced that is a good enough reason, but I want to remind people it exists.

lee-dohm commented 7 years ago

@MattSturgeon Here's what I see happening if we just accept Oniguruma:

Atom maintainers do a bunch of work to migrate everything to Oniguruma
People involved in this conversation all cheer
Some other group comes along and says it is too slow or doesn't have X feature
The whole argument starts all over again

We're not going to play Whack-a-Mole. I want to see a list of features and performance targets. I want to get broad-based community consensus that the list of features and performance targets is acceptable. I want us to find a regex engine that meets those features and targets. Then when someone comes along saying that isn't good enough I can point to all the work we did and the rationales everyone had for the choices made. "We couldn't think of a reason not to at the time," isn't a convincing rationale.

Stanzilla commented 7 years ago

What is wrong with ripgrep then? grep is a pretty accepted standard usually?

jeancroy commented 7 years ago

There's two engine to consider. One for search/replace in buffer, one for search/replace on whole project. Ripgrep apply to second case. Atom use scandal.

In general "what's wrong with new solution" is not a very useful question if we cannot pinpoint what's wrong with current solution.

bric3 commented 7 years ago

I have been monitoring this issue for a while and I'm sad of the state of the progress on this issue. I'm trying to make an honest feedback and I am trying to be constructive. I am myself a developer of an opensource framework, I know there is features wanted by some users that are not in the priority list for various reasons. While priorities are important let's not ignore features that are useful for the users. As a library developer several time I never imagined some usage that users had, and limitations in my library hit them, so I am enclined to listen about usage and help them if it is possible, sometime it can take a long time but we do it. Regarding the information this issue has, it is very limited it only have the enhancement tag, whatever the Atom team chose it should at least indicate the state of this issue be it delayed, stopped, re-scoped, low-priority, never, to be redefined...

Feedback :

I am a strong Regex users, and I am usually disappointed by tools that don't support powerful enough edition. Especialy when there is complicated pattern to find (wiht big data it matter). Regarding regex I usually expects the features found in perlRE. Now I usually code with JetBrains tools that are very very good as IDEs. This lack of search support drives me and my colleague to use JetBrains IDEs and as such away from Atom. So we have a solution that works for us, but it is disrupting, and instead of starting with Atom we avoid it.

Some ideas :

Why not the silver searcher (ag) which seems to follow the fairly common ack standard. Now I don't know if this approach can fit in the atom design as it requires to run a separate process. But the benefit of this approach allows the UI to not hang.
Regex engines varies in features and performance, yet I think a fairly good scope (for a better regex engine https://github.com/atom/find-and-replace/issues/571#issuecomment-288859566) would be the perl regex which has become quite popular outside the perl language (Java for example follows this engine in features).

jeancroy commented 7 years ago

I am a strong Regex users, and I am usually disappointed by tools that don't support powerful enough edition. Especially when there is complicated pattern to find (with big data it matter).

Can you share real life example where javascript engine fail your need ? I think at this point this would be the most useful input to tip the balance.

IMO big data is a separate problem. With large enough data, it become more practical to write your own small tool (or use specialized big data tools) than try to fit within the limitation of a general purpose text editor.

Why not the silver searcher (ag).

In general grep like tool lack the ability to do multiline matches. This is because speed-usability compromise is different. See for example https://github.com/BurntSushi/ripgrep/issues/176 or https://github.com/atom/scandal/issues/5.

Ag in particular is free of this issue, but seems to be less compatible with window (?). Also atom has an instant preview feature that need to test regexp on only some part of file, or files that are not yet saved to disk. (Finally the above issue of real life limitation remains).

ErikCorryGoogle commented 7 years ago

My performance fix for lookbehinds (and lookaheads) in V8 has landed. Of course it will take some time to percolate through to Atom.

bric3 commented 7 years ago

@jeancroy Sorry for delayed answer.

Can you share real life example where javascript engine fail your need ? I think at this point this would be the most useful input to tip the balance.

When dealing with a lot of heterogenous data, it is useful to inspect data. There are powerful tool on the comand line. But when you have to discover the data, and something is wrong or you want to just look at the structure of the data, I use quite often regular expressions, and more than rarely I use (positive / negative) look-ahead or look-behind expressions to identify specific parts of these files. Actions may vary for me, e.g. sometime it is just looking up in a file. Sometime I really want to fix it. What is disrupting is when you have to switch tools. Disclaimer : I understand what it is to use the right tool for the job, but for exploring data for any reason I'd like to stay in my editor.

Ah yes your are correct it seems complicated to have ag on windows. There's yet another alternative to ag, it's written in go and called the platinium searcher (pt) : https://github.com/monochromegane/the_platinum_searcher As it is writen in go it is multipatform.

jinglesthula commented 7 years ago

I tried the atom --js-flags="--harmony_regexp_lookbehind" approach and now using lookbehind doesn't show the error, but it does sit and spin forever with "0 paths searched". Searching w/o lookbehind in the regex still works. Anyone else try this with better results?

More on topic, does anyone know why lookbehind didn't land in ES6? That seems like the root problem. Tried googling but didn't find much discussion.

atom / find-and-replace

No support for lookbehind #571