bobbylight / RSyntaxTextArea

A syntax highlighting, code folding text editor for Java Swing applications.
BSD 3-Clause "New" or "Revised" License
1.11k stars 258 forks source link

Slow replace all for long lines with many hits #412

Open siggemannen opened 2 years ago

siggemannen commented 2 years ago

Hello, i'm using RSyntaxTextArea, a bit of an older version but suspect the problem exist in the new one as well. I have a long comma-separated line of something like 100k characters which i wanted to replace , with "," or something like that. But this replace never finishes!

When i traced the code, it seems that every replace (and there were probably like 10k of hits) triggers a "getTokenListForLine", which of course is not super-fast with 100k of characters parsing. From what i understand, every replace triggers a a event that triggers some listener which wants to parse the changed line, so i wonder if there's a way of either disabling this call until the absolutely last replace or perform some sort of bulk replace operation. I'm considering to make a patch that does a bulk replace but this probably have some unwanted side-effects =)

bobbylight commented 2 years ago

@siggemannen - there may well be performance issues for very long lines (certainly there is for rendering) but you might consider checking out the latest 3.1.4-SNAPSHOT on Sonatype. I've fixed this bug:

https://github.com/bobbylight/RSyntaxTextArea/issues/401

where if the searchWrap flag is set, a replace-all operation is an infinite loop. A workaround before the 3.1.4 release to Maven Central (which will definitely be before the end of the year) is to call setSearchWrap(false) before doing the replaace opeation.

If that's the bug you're hitting we can close this, but if it's just a performance issue, or something else, we can keep this open. Let me know!

siggemannen commented 2 years ago

Hi, i downloaded the master repo and wrote two tests that replaces a string. The only difference is that one test had one long line of text. In the second test i split it into new lines. I timed the actual replace operation and the long line test took 30 secs, and the new line test took 3 secs. Which seem to confirm that there's a a perf issue with long lines in particular.

The tests:

https://gist.github.com/siggemannen/3cc6797cf743f02247bc6d2499dde5f8

bobbylight commented 2 years ago

Yeah for sure, RSTAa is slower with long lines, I was just hoping it was addressed with that recent fix :P. I'll take a look.

bobbylight commented 2 years ago

A single long line with many matches for a replaceAll() operataion is a canonical worst-case, unfortunately. replaceall() calls replace() in a loop until the document is entirely searched, and the current visual "offset" (x-offst) in the current line isn't remembered from one replace() call to the next, if that makes sense. This menas the same (long) line is reparsed over and over, at least up to the caret position, so this is O(n^2) for a single-long-line document.

There's a lot of other stuff going on here - document locking, aggregating multiple individual undo/redo actions, selecting and (possibly) centering replaced text, but the time spent in these areas is miniscule, relatively speaking. Right now I think the best "fix" here is a rewrite of replaceAll() to not call replace() in a loop, but rather do its own thing

siggemannen commented 2 years ago

Yeah, it's tricky. My first idea was to actually perform a string replace separately and then do a setText(newString) once. It ain't pretty of course and i'm not sure it plays well with document filters.

Alternatively that one can suspend token generation while a replace operation is in progress and raise the event manually. Perhaps via a new flag in RTextArea. At least this way you can see your cursor fluctuate :D