Closed zufuliu closed 3 months ago
I think the fix can be delayed untill there is bug in real code, also it not fix DBCS heredoc delimiter. Code use DBCS character (instead of ASCII or UTF-8) as identifier is non-portable.
# encoding: gbk
puts <<中
#{1+2}
中
Close this as won't fix. Move styler.IsLeadByte()
to the end of for
loop (before chPrev = ch;
) seems will fix both problems, but it's hard to test due to advance_char()
, redo_char()
and InterpolateVariable()
changes ch
, chNext
and chNext2
.
following is highlighted differently in UTF-8 and DBCS code pages:
https://docs.ruby-lang.org/en/master/syntax/methods_rdoc.html#label-Method+Names
Patch to fix the bug: runy-dbcs-0402.zip
chPrev = ch;
is required for laterisSafeWordcharOrHigh(chPrev)
test, which is also reasonable (indicates previous character is non-ASCII instead of space).It might not worth the complex to fix the bug (no one reported bugs for this), https://docs.ruby-lang.org/en/2.7.0/Encoding.html#class-Encoding-label-Script+encoding says:
latest doc at https://docs.ruby-lang.org/en/master/encodings_rdoc.html#label-Script+Encoding doesn't mention with version changed to UTF-8.
Similar DBCS character handling pattern (
chPrev = ' '
) was copied (from LexHTML.cxx?) into other lexers, they may have similar bug. e.g. as PHP also treat non-ASCII bytes as identifier, so usage forchPrev
andchPrev2
inside LexHTML.cxx may needs extra check. https://www.php.net/manual/en/language.variables.basics.php