Open inkydragon opened 8 months ago
I would guess this is related to https://github.com/dail8859/NotepadNext/issues/192
It is probably restarting the search after the replace in the incorrect spot, thus an infinite loop.
The regex implementation is...quite rough to be honest, and I figured there were quite a few corner cases. My gut feeling is that the QRegularExpression class is not a long term solution.
Currently there are 3 options:
I think the problem here (and with #192) is actually not QRegularExpression itself; it handles this case fine. I think the issue is with the way it is used with NotepadNext as it seems to be relying on the .captured()
returning a non-zero length string. If you use the QRegularExpression in a more direct manner to do the replacements it seems to work, Example:
#include <QtGlobal>
#include <QtCore>
#include <QDebug>
#include <QTextStream>
#include <QRegularExpression>
#include <iostream>
int main(int argc, char *argv[])
{
QTextStream qout(stdout);
auto options = QRegularExpression::MultilineOption | QRegularExpression::UseUnicodePropertiesOption;
QRegularExpression re("^", options);
QString replacement = "//";
QString inputString = "1\n2\n3\n";
QRegularExpressionMatch match = re.match(inputString, 0, QRegularExpression::NormalMatch, QRegularExpression::NoMatchOption);
qDebug() << "input string: ";
qDebug() << inputString;
QString newString = inputString.replace(match.regularExpression(), replacement);
qDebug() << " --------------------- ";
qDebug() << newString;
qDebug() << " --------------------- ";
}
Outputs the following:
input string:
"1\n2\n3\n"
---------------------
"//1\n//2\n//3\n"
---------------------
Which (unless I'm missing something entirely) suggests that the library is fine for this case (tested under QT 5.15 on Debian 12)
@mintsoft Thanks for the info. You are correct that purely using QRegularExpression does work. However to integrate with the editing component you can't simply just call .replace()
on a string.
Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then ^
always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid ^
match with //
then when the search starts again after the previously replaced //
it then finds another match...and gets stuck.
Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then
^
always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid^
match with//
then when the search starts again after the previously replaced//
it then finds another match...and gets stuck.
Ahh I see. It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.
I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?
It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.
That is partially correct. If I recall if you use something like re2
(which I believe QRegularExpression
uses) then there are more flags you can pass it to tell it that the beginning of the string you are providing isn't actually the start. But that was a while ago...so maybe I'm misremembering. QRegularExpression
doesn't provide that fine of control.
That's the whole reason why in the long run I suspect Notepad++'s implementation might be worth while since it uses boostregex and they've already worked out all the corner cases :)
I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?
That is a big part of it. The other part is to let Scintilla properly know about each individual change so that it can properly manage the undo stack along with other internals.
On a side note, Scintilla by default supports the standard library regex std::regex
but I recall looking into that at one time and found some limitations...not sure if things have improved since then.
On a side note, Scintilla by default supports the standard library regex
std::regex
but I recall looking into that at one time and found some limitations...not sure if things have improved since then.
Yeah, the docs point towards that being quite limited (no ?
, or lookaheads/behinds etc); I think that's actually the regex engine that Notepad++ used to have about 10 years ago, I remember it was very very limited for a while
Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)
That is it's own very basic implementation of a regex engine. But it can also use std::regex
which I'm not sure how well it matches up to something like boost.
Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)
That is it's own very basic implementation of a regex engine. But it can also use
std::regex
which I'm not sure how well it matches up to something like boost.
It's probably on-par than Scintilla's one, however if you want PCRE (i.e. "proper regex") then the Boost library is the best bet certainly
Another reproduction of this bug is any find/replace that does not change the actual content or that the replaced string still matches the "find" i.e.:
Find (.*)
Replace $1
This will also loop forever
the Boost library is the best bet I am going to add original PCRE2 after finishing current two PRs
the Boost library is the best bet
Notepad++ has used an implementation of the boost regex library with the Scintilla editor and it has been proven to be quite robust over the years (not saying it doesn't have issues).
I've looked at the code a little bit before but it is not just a drop-in replacement since it was made for Windows and Win32 https://github.com/notepad-plus-plus/notepad-plus-plus/tree/master/boostregex
txt file
Replace
^
=>//
Notepad Next CPU load 5%~8% Memory usage: rise slowly
When debug with WinDbg I see those lines repeat forever:
const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
Full log
```log [ 0.094] I: ============================= [ 0.094] I: Notepad Next v0.6.4.0 [ 0.094] I: Build Date/Time: Oct 18 2023 16:53:08 [ 0.095] I: Qt: 6.5.3 [ 0.095] I: OS: Windows 11 Version 23H2 [ 0.095] I: Locale: zh_CN [ 0.095] I: CPU: x86_64 [ 0.095] I: File Path: D:/env/Notepad Next/NotepadNext.exe [ 0.095] I: Arguments: D:\env\Notepad Next\NotepadNext.exe [ 0.095] I: ============================= [ 0.095] I: bool __cdecl NotepadNextApplication::init(void) ModLoad: 00007ffc`d42b0000 00007ffc`d42be000 D:\env\Notepad Next\imageformats\qgif.dll ModLoad: 00007ffc`d1440000 00007ffc`d144e000 D:\env\Notepad Next\imageformats\qico.dll ModLoad: 00007ffc`87ee0000 00007ffc`87f7a000 D:\env\Notepad Next\imageformats\qjpeg.dll ModLoad: 00007ffc`d1230000 00007ffc`d123c000 D:\env\Notepad Next\imageformats\qsvg.dll ModLoad: 00007ffc`d11d0000 00007ffc`d122a000 D:\env\Notepad Next\Qt6Svg.dll [ 0.231] I: void __cdecl NotepadNextApplication::loadTranslation(class QLocale) [ 0.231] I: Loaded zh_CN translation :/i18n/NotepadNext_zh_CN.qm for Notepad Next [ 0.232] I: zh_CN translation not found for Qt components [ 0.232] I: __cdecl LuaState::LuaState(void) [ 0.245] I: __cdecl MacroManager::MacroManager(class QObject *) [ 0.245] I: void __cdecl MacroManager::loadSettings(void) [ 0.245] I: __cdecl MainWindow::MainWindow(class NotepadNextApplication *) [ 0.380] I: setupUi Completed [ 0.464] I: __cdecl FileListDock::FileListDock(class MainWindow *) [ 0.465] I: void __cdecl MainWindow::setupLanguageMenu(void) [ 0.496] I: void __cdecl MainWindow::restoreSettings(void) [ 0.496] I: AutoUpdates: 1 [ 0.496] I: void __cdecl NotepadNextApplication::openFiles(const class QList