Open Alexey-T opened 1 year ago
Test-proj has it's own UnicodeRE define copy. but even if i disable it, tests fail! @user4martin
Well, haven't checked background yet, but
// 69
( // empty str
expression: '^ *$';
inputText: '';
substitutionText: '';
expectedResult: '';
matchStart: 1
fails for me => and that looks like a bug in the regex engine.
The others are down to the define.
Ideally the defines need to move into their own include files.
Then (and I can check that) the failing test may need to be disabled.
The [-] range of Russian chars seems not to be implemented for utf8 yet. Possible, but an issue of its own (and not necessary one that would have my time soon).
The #%85 line break => same thing. But maybe can be fixed easy for utf8.
IsAnyLineBreak
could be changed to take a pointer to ReChar.
Then it could return zero, or the length of any matched line break. That way it could handle utf-8 encoded line breaks of more than one byte.
The test case would then need to be changed to have #$C2#$85
in the string.
Then it could return zero, or the length of any matched line break.
do code need this really, if it works good already? only more complex logic.
Then it could return zero, or the length of any matched line break.
do code need this really, if it works good already? only more complex logic.
Well, is "not implemented" = works good?
At the moment, using the utf-8 version, Linebreaks like "'NEXT LINE (NEL)' (U+0085)" are simple not detected. utf-8 is unicode, so those codes do exist.
That is unless it is meant to be ASCII? Then a utf-8 version is really needed. (And afaik there is more to be fixed for proper utf8 support, but this would be a start)
so it is needed, okay.
but is it needed that in non-Unicode mode we must find pure Unicode linebreak? we can ignore chr(85) in non-Unicode mode, logical.
but is it needed that in non-Unicode mode we must find pure Unicode linebreak? we can ignore chr(85) in non-Unicode mode, logical.
IMHO: Wrong Question. Utf8 is also a Unicode mode.
The question is: Does the regex currently have an ASCII (non Unicode) or an Ut8 (Unicode) mode?
But, IMHO the answer does not matter. IMHO a Utf8 mode is what is needed.
So then the only question is: Utf8 mode: Add or Fix?
So then the only question is: Utf8 mode: Add or Fix?
Add.
Martin, can you adjust tests project to not fail with UnicodeRE off? You are really good in composing tests. I admit. @user4martin