ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
163 stars 59 forks source link

LexLua Unicode identifier issue #242

Open wzchenmo opened 2 months ago

wzchenmo commented 2 months ago

After updating to version 4.0.4, I encountered an issue where Unicode characters are not correctly matching punctuation marks and are not being colored appropriately. 20240423122648

nyamatongwe commented 2 months ago

Images can't be copied into an editor for checking so post text or files containing text.

wzchenmo commented 2 months ago

输出栏选择夹:加入子夹("提示"):加入子夹("输出"):加入子夹("提示&输出"):加入子夹("Log")
支持库管理器:置坐标(0,工具栏:取高度() + 资源栏选择夹:取高度()):置宽高(分隔条_X.x,分隔条_Y:取坐标()):显示()

選択グリップ:に参加(10,"グリップ"):に参加("グリップ")

클립선택:가입("레이블")

string:rep("ddf"):gsub("ddf","ffd")
nyamatongwe commented 2 months ago

Likely caused by bug 1952 and its fix.

https://sourceforge.net/p/scintilla/bugs/1952/ https://sourceforge.net/p/scintilla/code/ci/8fb85a29591f0aee4d6da22c95e1d03e93c8fa34/

nyamatongwe commented 2 months ago

Short failure, all in SCE_LUA_IDENTIFIER.

プ("ッ")
nyamatongwe commented 2 months ago

It appears to not be taking the number of bytes in the character at the start of the identifier into account when moving past the identifier. This patch is a bit uncertain and there are likely better ways to do this.

diff --git a/lexers/LexLua.cxx b/lexers/LexLua.cxx
index dfb158ff..fe8eab0d 100644
--- a/lexers/LexLua.cxx
+++ b/lexers/LexLua.cxx
@@ -259,6 +259,7 @@ void LexerLua::Lex(Sci_PositionU startPos, Sci_Position length, int initStyle, I

    // results of identifier/keyword matching
    Sci_Position idenPos = 0;
+   Sci_Position idenStartCharWidth = 0;
    Sci_Position idenWordPos = 0;
    int idenStyle = SCE_LUA_IDENTIFIER;
    bool foundGoto = false;
@@ -359,7 +360,7 @@ void LexerLua::Lex(Sci_PositionU startPos, Sci_Position length, int initStyle, I
                    sc.SetState(SCE_LUA_DEFAULT);
            }
        } else if (sc.state == SCE_LUA_IDENTIFIER) {
-           idenPos--;          // commit already-scanned identifier/word parts
+           idenPos -= idenStartCharWidth;          // commit already-scanned identifier/word parts
            if (idenWordPos > 0) {
                idenWordPos--;
                sc.ChangeState(idenStyle);
@@ -449,6 +450,7 @@ void LexerLua::Lex(Sci_PositionU startPos, Sci_Position length, int initStyle, I
                // set to a word style. The non-matched part is in identifier style.
                std::string ident;
                idenPos = 0;
+               idenStartCharWidth = sc.width;
                idenWordPos = 0;
                idenStyle = SCE_LUA_IDENTIFIER;
                foundGoto = false;

LuaUnicodeIdentifiers

idenStartCharWidth.patch.txt

wzchenmo commented 2 months ago

Thank you for your efforts. My abilities are limited and I cannot help. I tested the patch you provided and everything is back to normal. No problems have been found yet. 20240501105546