firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.24k stars 199 forks source link

Waterfox crashes when #1919

Closed 4evermaat closed 1 year ago

4evermaat commented 1 year ago

Waterfox G5.0.1 (64-bit)

I notice that whenever I tried to clear sample text from the test string field, the amount of RAM used grows from 200-300 to 1400MB. Then after waiting like 30 seconds, it grows to nearly 3000MB before crashing.

The sample text is approximately

73,060,684 characters 519,911 lines

Brave [Version 1.44.108 Chromium: 106.0.5249.103 (Official Build) (64-bit)](https://brave.com/latest/) has the same problem. It's actually a little bit worse.
And the CPU usage remained high and 3000MB+ RAM continues to be used, even after the ctrl + A highlighting is finished. (which took like 40-60 seconds to complete) And then I press the delete button... Another 2 minutes and a separate brave process with 1300MB RAM is used (4400MB total RAM in use)

And all this was reattempted with the RegEx text field blank. Same results.


as a side note, I do think that that page should have copy/paste/delete buttons to copy/clear text from the RegEx Test String and Substitution text boxes. I assume that these buttons would provide a more efficient method to copy/paste/clear text from these fields.

Because I think that when I attempt to highlight all (ctrl + A) and then delete the sample text, the RegEx101 might be doing some unnecessary calculations. And having buttons that would force a break/pause in calculations to properly highlight/manipulate text.

Or maybe increase the delay between when text is manipulated and when a recalculation occurs.

The highlighting and deleting used to be slow (about 10-15 seconds to complete)...but now it is unusable. Not sure what changes were done

working-name commented 1 year ago

I was going to ask if you have a link to demo the issue but just realized you have 500K lines of text 🤩

Not sure how much can be done to help with performance when such a huge amount of text is handled. My Brave browser will kill the tab as out of memory if I try to select all text that's 2 million lines, 75 million characters (similar to yours).

Have you tried doing the regex in a programming language instead of using the site?

4evermaat commented 1 year ago

Not sure how much can be done to help with performance when such a huge amount of text is handled. My Brave browser will kill the tab as out of memory if I try to select all text that's 2 million lines, 75 million characters (similar to yours).

That's why I made the suggestion that there be a textbox buttons that will allow for select all/delete/copy/paste and these buttons can perform the operations without doing any regex calculations (insert xxxx ms delay before calculation)...I'm thinking some pause/delay is introduced so that the text can be reasonably be copied/pasted/deleted BEFORE any calculations are done.

Or ignore recalculation whenever these specific activities are detected select all/delete/copy/paste.

In fact, there could be an additional box where users can edit the ms delay before recalculation. That would help croudsource and find the optimal delay.

Have you tried doing the regex in a programming language instead of using the site?

I'm not the programming type. I would normally use EmEditor, which uses boost/PCRE2. Or TextCrawler (.NET) TextCrawler has its own tester with highlighting and ability to see replacement preview. This allows "beginners" to see if their regex logic is matching correctly only what they want. TextCrawler also has batch jobs, so you can use several smaller regex sequentially to reshape the final result. And use command line to run those batch jobs.

But it is good to use an independent one like regex101. So I can more easily see why matches are occurring the way they are. and play around with the regex until I get the exact results I need.

Emeditor is fast, but when the syntax is wrong....I scratch my head wondering why. And to answer your question, EmEditor completed the search over the same text (when I got the syntax correct) in 0.031 seconds using 1 thread.

firasdib commented 1 year ago

I think this might be the limitation of what codemirror can do. I assume the same thing happens when you try it on https://codemirror.net/ ?

The site was never intended to handle such large strings.