dail8859 / NotepadNext

A cross-platform, reimplementation of Notepad++
GNU General Public License v3.0
9.1k stars 552 forks source link

Replace with regex cause stucks #500

Open inkydragon opened 8 months ago

inkydragon commented 8 months ago

txt file

1
2
3

Replace ^ => //

Notepad Next CPU load 5%~8% Memory usage: rise slowly

When debug with WinDbg I see those lines repeat forever: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)

Full log ```log [ 0.094] I: ============================= [ 0.094] I: Notepad Next v0.6.4.0 [ 0.094] I: Build Date/Time: Oct 18 2023 16:53:08 [ 0.095] I: Qt: 6.5.3 [ 0.095] I: OS: Windows 11 Version 23H2 [ 0.095] I: Locale: zh_CN [ 0.095] I: CPU: x86_64 [ 0.095] I: File Path: D:/env/Notepad Next/NotepadNext.exe [ 0.095] I: Arguments: D:\env\Notepad Next\NotepadNext.exe [ 0.095] I: ============================= [ 0.095] I: bool __cdecl NotepadNextApplication::init(void) ModLoad: 00007ffc`d42b0000 00007ffc`d42be000 D:\env\Notepad Next\imageformats\qgif.dll ModLoad: 00007ffc`d1440000 00007ffc`d144e000 D:\env\Notepad Next\imageformats\qico.dll ModLoad: 00007ffc`87ee0000 00007ffc`87f7a000 D:\env\Notepad Next\imageformats\qjpeg.dll ModLoad: 00007ffc`d1230000 00007ffc`d123c000 D:\env\Notepad Next\imageformats\qsvg.dll ModLoad: 00007ffc`d11d0000 00007ffc`d122a000 D:\env\Notepad Next\Qt6Svg.dll [ 0.231] I: void __cdecl NotepadNextApplication::loadTranslation(class QLocale) [ 0.231] I: Loaded zh_CN translation :/i18n/NotepadNext_zh_CN.qm for Notepad Next [ 0.232] I: zh_CN translation not found for Qt components [ 0.232] I: __cdecl LuaState::LuaState(void) [ 0.245] I: __cdecl MacroManager::MacroManager(class QObject *) [ 0.245] I: void __cdecl MacroManager::loadSettings(void) [ 0.245] I: __cdecl MainWindow::MainWindow(class NotepadNextApplication *) [ 0.380] I: setupUi Completed [ 0.464] I: __cdecl FileListDock::FileListDock(class MainWindow *) [ 0.465] I: void __cdecl MainWindow::setupLanguageMenu(void) [ 0.496] I: void __cdecl MainWindow::restoreSettings(void) [ 0.496] I: AutoUpdates: 1 [ 0.496] I: void __cdecl NotepadNextApplication::openFiles(const class QList &) [ 0.496] I: void __cdecl MainWindow::newFile(void) [ 0.497] I: void __cdecl EditorManager::setupEditor(class ScintillaNext *) [ 0.498] I: void __cdecl MainWindow::addEditor(class ScintillaNext *) [ 0.498] I: void __cdecl MainWindow::detectLanguage(class ScintillaNext *) [ 0.498] I: void __cdecl MainWindow::setLanguage(class ScintillaNext *,const class QString &) [ 0.498] I: Language Name: Text [ 0.498] I: void __cdecl DockedEditor::addEditor(class ScintillaNext *) ModLoad: 00007ffc`49010000 00007ffc`490c0000 C:\WINDOWS\SYSTEM32\d3d9on12.dll ModLoad: 00007ffc`48fe0000 00007ffc`49009000 C:\WINDOWS\SYSTEM32\d3d12.dll ModLoad: 00007ffc`48dd0000 00007ffc`48fd3000 C:\WINDOWS\SYSTEM32\D3D12Core.dll ModLoad: 00007ffc`48da0000 00007ffc`48dc7000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12umd64.dll ModLoad: 00007ffc`47ea0000 00007ffc`48d9f000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12um64xel.dll ModLoad: 00007ffc`dd640000 00007ffc`dd7a6000 C:\WINDOWS\System32\CRYPT32.dll ModLoad: 00007ffc`97480000 00007ffc`974c5000 C:\WINDOWS\SYSTEM32\ControlLib.dll ModLoad: 00007ffc`b8580000 00007ffc`b860c000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\IntelControlLib.dll ModLoad: 00007ffc`da9b0000 00007ffc`da9c5000 C:\WINDOWS\SYSTEM32\resourcepolicyclient.dll ModLoad: 00007ffc`cf5a0000 00007ffc`cf5ea000 C:\WINDOWS\SYSTEM32\directxdatabasehelper.dll ModLoad: 00007ffc`ae810000 00007ffc`ae849000 C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvdlistx.dll ModLoad: 00007ffc`ae810000 00007ffc`ae849000 C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvdlistx.dll ModLoad: 00007ffc`b0ec0000 00007ffc`b12fb000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igdgmm64.dll ModLoad: 00007ffc`a2c80000 00007ffc`a6b4b000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igc64.dll ModLoad: 00007ffc`47ab0000 00007ffc`47adf000 C:\WINDOWS\SYSTEM32\D3DSCache.dll ModLoad: 00007ffc`477b0000 00007ffc`47aa9000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12dxva64.dll ModLoad: 00007ffc`465d0000 00007ffc`477aa000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igddxvacommon64.dll ModLoad: 00007ffc`448a0000 00007ffc`465c5000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\media_bin_64.dll ModLoad: 00007ffc`44870000 00007ffc`44897000 C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igdinfo64.dll ModLoad: 00007ffc`dcb30000 00007ffc`dcb4b000 C:\WINDOWS\SYSTEM32\CRYPTSP.dll ModLoad: 00007ffc`dc2f0000 00007ffc`dc325000 C:\WINDOWS\system32\rsaenh.dll ModLoad: 00007ffc`cc800000 00007ffc`cca85000 C:\WINDOWS\SYSTEM32\twinapi.appcore.dll ModLoad: 00007ffc`7bfa0000 00007ffc`7c11e000 C:\WINDOWS\SYSTEM32\dxilconv.dll ModLoad: 00007ffc`da9b0000 00007ffc`da9c5000 C:\WINDOWS\SYSTEM32\resourcepolicyclient.dll ModLoad: 00007ffc`95e60000 00007ffc`95f1d000 C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvldumdx.dll ModLoad: 00007ffc`dcf30000 00007ffc`dcf42000 C:\WINDOWS\SYSTEM32\msasn1.dll ModLoad: 00007ffc`cd760000 00007ffc`cd792000 C:\WINDOWS\SYSTEM32\cryptnet.dll ModLoad: 00007ffc`cd580000 00007ffc`cd6d8000 C:\WINDOWS\SYSTEM32\drvstore.dll ModLoad: 00007ffc`dca70000 00007ffc`dcabb000 C:\WINDOWS\SYSTEM32\wldp.dll ModLoad: 00007ffc`ddc30000 00007ffc`ddc9b000 C:\WINDOWS\System32\WINTRUST.DLL ModLoad: 00007ffc`df720000 00007ffc`df73f000 C:\WINDOWS\System32\imagehlp.dll ModLoad: 00007ffb`e8580000 00007ffb`eb474000 C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvd3dumx.dll ModLoad: 00007ffc`88b60000 00007ffc`88bbe000 C:\WINDOWS\system32\dataexchange.dll ModLoad: 00007ffc`cc800000 00007ffc`cca85000 C:\WINDOWS\system32\twinapi.appcore.dll [ 0.829] I: void __cdecl MainWindow::activateEditor(class ScintillaNext *) [ 0.829] I: bool __cdecl MainWindow::checkFileForModification(class ScintillaNext *) [ 0.829] I: void __cdecl MainWindow::updateGui(class ScintillaNext *) [ 0.829] I: void __cdecl MainWindow::updateFileStatusBasedUi(class ScintillaNext *) [ 0.829] I: void __cdecl MainWindow::updateSaveStatusBasedUi(class ScintillaNext *) [ 0.830] I: void __cdecl MainWindow::updateEOLBasedUi(class ScintillaNext *) [ 0.830] I: void __cdecl MainWindow::updateLanguageBasedUi(class ScintillaNext *) ModLoad: 00007ffc`c8460000 00007ffc`c85aa000 C:\WINDOWS\SYSTEM32\textinputframework.dll ModLoad: 00007ffc`aca50000 00007ffc`acaf5000 C:\Windows\System32\Windows.FileExplorer.Common.dll ModLoad: 00007ffc`d8bc0000 00007ffc`d8cc1000 C:\WINDOWS\system32\propsys.dll ModLoad: 00007ffc`8d1f0000 00007ffc`8d20a000 C:\WINDOWS\system32\NetworkExplorer.dll ModLoad: 00007ffc`88170000 00007ffc`88419000 C:\WINDOWS\system32\explorerframe.dll [ 1.035] I: class Scintilla::Internal::RegexSearchBase *__cdecl Scintilla::Internal::CreateRegexSearch(class Scintilla::Internal::CharClassify *) ModLoad: 00007ffc`d59d0000 00007ffc`d5b04000 C:\WINDOWS\SYSTEM32\CoreMessaging.dll ModLoad: 00007ffc`ce430000 00007ffc`ce79c000 C:\WINDOWS\SYSTEM32\CoreUIComponents.dll ModLoad: 00007ffc`b2920000 00007ffc`b2948000 C:\WINDOWS\SYSTEM32\edputil.dll [ 4.643] I: void __cdecl MainWindow::updateSaveStatusBasedUi(class ScintillaNext *) [ 6.007] I: __cdecl FindReplaceDialog::FindReplaceDialog(class ISearchResultsHandler *,class MainWindow *) [ 6.028] I: void __cdecl FindReplaceDialog::loadSettings(void) [ 6.041] I: void __cdecl FindReplaceDialog::showEvent(class QShowEvent *) [ 15.497] I: Last checked for updates at: Wed Jan 24 09:19:29 2024 [ 15.774] I: void __cdecl FindReplaceDialog::replaceAll(void) [ 15.775] I: void __cdecl FindReplaceDialog::prepareToPerformSearch(bool) [ 15.775] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.776] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.777] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.778] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.779] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.780] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.780] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.781] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.781] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.782] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.782] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.783] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.784] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) [ 15.784] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *) ..... ```
dail8859 commented 8 months ago

I would guess this is related to https://github.com/dail8859/NotepadNext/issues/192

It is probably restarting the search after the replace in the incorrect spot, thus an infinite loop.

The regex implementation is...quite rough to be honest, and I figured there were quite a few corner cases. My gut feeling is that the QRegularExpression class is not a long term solution.

Currently there are 3 options:

mintsoft commented 8 months ago

I think the problem here (and with #192) is actually not QRegularExpression itself; it handles this case fine. I think the issue is with the way it is used with NotepadNext as it seems to be relying on the .captured() returning a non-zero length string. If you use the QRegularExpression in a more direct manner to do the replacements it seems to work, Example:

#include <QtGlobal>
#include <QtCore>
#include <QDebug>
#include <QTextStream>
#include <QRegularExpression>
#include <iostream>

int main(int argc, char *argv[])
{
    QTextStream qout(stdout);

    auto options = QRegularExpression::MultilineOption | QRegularExpression::UseUnicodePropertiesOption;
    QRegularExpression re("^", options);

    QString replacement = "//";
    QString inputString = "1\n2\n3\n";
    QRegularExpressionMatch match = re.match(inputString, 0, QRegularExpression::NormalMatch, QRegularExpression::NoMatchOption);

    qDebug() << "input string: ";
    qDebug() << inputString;

    QString newString = inputString.replace(match.regularExpression(), replacement);

    qDebug() << " --------------------- ";
    qDebug() << newString;
    qDebug() << " --------------------- ";

}

Outputs the following:

input string: 
"1\n2\n3\n"
 --------------------- 
"//1\n//2\n//3\n"
 --------------------- 

Which (unless I'm missing something entirely) suggests that the library is fine for this case (tested under QT 5.15 on Debian 12)

dail8859 commented 8 months ago

@mintsoft Thanks for the info. You are correct that purely using QRegularExpression does work. However to integrate with the editing component you can't simply just call .replace() on a string.

Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then ^ always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid ^ match with // then when the search starts again after the previously replaced // it then finds another match...and gets stuck.

mintsoft commented 8 months ago

Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then ^ always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid ^ match with // then when the search starts again after the previously replaced // it then finds another match...and gets stuck.

Ahh I see. It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.

I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?

dail8859 commented 8 months ago

It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.

That is partially correct. If I recall if you use something like re2 (which I believe QRegularExpression uses) then there are more flags you can pass it to tell it that the beginning of the string you are providing isn't actually the start. But that was a while ago...so maybe I'm misremembering. QRegularExpression doesn't provide that fine of control.

That's the whole reason why in the long run I suspect Notepad++'s implementation might be worth while since it uses boostregex and they've already worked out all the corner cases :)

I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?

That is a big part of it. The other part is to let Scintilla properly know about each individual change so that it can properly manage the undo stack along with other internals.

dail8859 commented 8 months ago

On a side note, Scintilla by default supports the standard library regex std::regex but I recall looking into that at one time and found some limitations...not sure if things have improved since then.

mintsoft commented 8 months ago

On a side note, Scintilla by default supports the standard library regex std::regex but I recall looking into that at one time and found some limitations...not sure if things have improved since then.

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc); I think that's actually the regex engine that Notepad++ used to have about 10 years ago, I remember it was very very limited for a while

dail8859 commented 8 months ago

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)

That is it's own very basic implementation of a regex engine. But it can also use std::regex which I'm not sure how well it matches up to something like boost.

mintsoft commented 8 months ago

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)

That is it's own very basic implementation of a regex engine. But it can also use std::regex which I'm not sure how well it matches up to something like boost.

It's probably on-par than Scintilla's one, however if you want PCRE (i.e. "proper regex") then the Boost library is the best bet certainly

mintsoft commented 8 months ago

Another reproduction of this bug is any find/replace that does not change the actual content or that the replaced string still matches the "find" i.e.:

Find (.*) Replace $1

This will also loop forever

SimLV commented 5 months ago

the Boost library is the best bet I am going to add original PCRE2 after finishing current two PRs

dail8859 commented 5 months ago

the Boost library is the best bet

Notepad++ has used an implementation of the boost regex library with the Scintilla editor and it has been proven to be quite robust over the years (not saying it doesn't have issues).

I've looked at the code a little bit before but it is not just a drop-in replacement since it was made for Windows and Win32 https://github.com/notepad-plus-plus/notepad-plus-plus/tree/master/boostregex