Open ProgerXP opened 4 years ago
Additional cases:
Search string: $
Replace with: \n\1\1\n
aa
bb
Search string: (.*)$
Replace with: \n\1\1\n
aaaa
bbbb
Fixed.
`const int isRegexEOL = (szFind2[iFind2Length - 1] == '$') && ((iFind2Length < 2) || (szFind2[iFind2Length - 2] != '\\'));`
We should never try to parse regexps manually. This is not a universal fix. Can't this be fixed properly?
I'm not aware of the way how this could be done without manual target range adjustment due to the following reason:
BoostRegexSearch::FindTextImpl()
).$
is not EOL char, but the last position on line.Thus, regex replace with a string containing \n will not automatically adjust the target range (will not skip the EOL from the current line) which caused specified hang.
Current fix allow regex to deal with multiline replace strings by manual next range calculation:
$
than we should skip additional EOL (the one mentioned in the previous paragraph).It's true that $
is "0 characters long". However, we can go from the other side: instead of parsing the regexp (which we can never do correctly), see if there are line breaks in the replacement string and add this number of bytes to the match position.
This might fail if line breaks are retrieved from a capture group ((\n)
-> $1
) but this is a much more rare case than various combinations of regexp containing $
($|foo
, (foo|$)
, etc. etc.).
instead of parsing the regexp (which we can never do correctly)
`const int isRegexEOL = (szFind2[iFind2Length - 1] == '$') && ((iFind2Length < 2) || (szFind2[iFind2Length - 2] != '\\'));`
This is not actual parsing. We only should detect if trailing character is unquoted $
:
$
\
This might fail if line breaks are retrieved from a capture group (
(\n)
->$1
) but this is a much more rare case than various combinations of regexp containing$
($|foo
,(foo|$)
, etc. etc.).
isRegexEOL = true
-logic.Please confirm if current solution does not fit.
Update: there is a missing case in current isRegexEOL
detection logic:
\\$
- regex to find line consisting of single \
will not be correctly processed since isRegexEOL=false
.
Additional condition should be added: number of serial \
before $
should be even.
Another bug that likely has same root cause:
a@b@c
a@b@c
Do Replace from ^[^@]+@
to empty string in regexp mode. Result:
c
c
Instead of:
b@c
b@c
In other words, regexp must never be applied more than once to the same region in the source file. Here it is applied to each line in a loop until there are no more matches in that line. This is incorrect, check tools like sed
that do it properly:
(echo a@b@c; echo a@b@c) | sed -r 's/^[^@]+@//g'
Additional test cases:
a@b@c
1
abc@b@c
2
Replace from: ^[^@]+@.*$
Replace to: \n
Result:
1
2
Replace from: ^([^@]+@.*)$
Replace to: \1X\n\1
Result:
a@b@cX
a@b@c
1
abc@b@cX
abc@b@c
2
Fixed.
Document:
aa bb
aa bb
Replace from .*
to z
in regexp mode. Result:
zz
z
Instead of (.*
is greedy, matches entire line):
z
z
There was a change in RESearchRange
class (came from Scintilla 3.11.2) which caused specified incorrect behavior when using BoostRegExSearch
-implementation. Fixed.
Any news on when “upcoming release”, which includes this fix, will be available as a binary?
@sergeevabc Any news on when “upcoming release”, which includes this fix, will be available as a binary?
These fixes are already in beta builds that you can download from here. However, this is really tricky, I am still running into multiple problems with current implementation and I'm considering rolling it back entirely and try addressing the original issue ("Replace on $
") in some other way.
What is your use case?
Document:
Run Replace from
$
to\n
in regexp mode - the program will loop forever. Desired result: