ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
186 stars 66 forks source link

[HTML] Unstable per-line lexing of substyled PHP keywords #288

Open rdipardo opened 2 days ago

rdipardo commented 2 days ago

Cross-posted from notepad-plus-plus/notepad-plus-plus#15801

Minimal reproduction

The substyle will extend beyond the keyword until the end of the document:

scite-553-php-121-substyled

Running TestLexers on the script also raises a "different per-line styles" error.

zufuliu commented 1 day ago

with following change in LexHTML.cxx:

diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx
index 51cac259..a9a96973 100644
--- a/lexers/LexHTML.cxx
+++ b/lexers/LexHTML.cxx
@@ -1190,12 +1190,13 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
    if (isPHPScript && (startPos == 0)) {
        initStyle = SCE_HPHP_DEFAULT;
    }
+   printf("%d startPos=%zu, initStyle=%d\n", __LINE__, startPos, initStyle);
    styler.StartAt(startPos);
    std::string lastTag;
    std::string prevWord;
    PhpNumberState phpNumber;
    std::string phpStringDelimiter;
-   int StateToPrint = initStyle;
+   int StateToPrint = initStyle & 0xff;
    int state = stateForPrintState(StateToPrint);
    std::string makoBlockType;
    int makoComment = 0;
@@ -1208,6 +1209,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
            startPos = backLineStart;
        }
        state = (startPos > 0) ? styler.StyleIndexAt(startPos - 1) : SCE_H_DEFAULT;
+       printf("%d InTagState startPos=%zu, state=%d\n", __LINE__, startPos, state);
    }
    // String can be heredoc, must find a delimiter first. Reread from beginning of line containing the string, to get the correct lineState
    if (isPHPStringState(state)) {
@@ -1218,6 +1220,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
        }
        if (startPos == 0)
            state = SCE_H_DEFAULT;
+       printf("%d isPHPStringState startPos=%zu, state=%d\n", __LINE__, startPos, state);
    }
    styler.StartAt(startPos);

SciTE output when adding a character, then deleted it.

1193 startPos=0, initStyle=0
1193 startPos=42, initStyle=119
1223 isPHPStringState startPos=3, state=198
1193 startPos=39, initStyle=-58

Negative initStyle previously discussed at https://sourceforge.net/p/scintilla/feature-requests/1431/, I still prefer styleStart = pdoc->StyleIndexAt(start - 1);, no lexer in lexilla will broke by the change.

zufuliu commented 1 day ago

PHP-288-11-15.patch Draft patch to fix this:

@@ -1195,7 +1195,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
    std::string prevWord;
    PhpNumberState phpNumber;
    std::string phpStringDelimiter;
-   int StateToPrint = initStyle;
+   int StateToPrint = initStyle & 0xff;
    int state = stateForPrintState(StateToPrint);
    std::string makoBlockType;
    int makoComment = 0;
@@ -1216,8 +1216,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int
            length++;
            state = styler.StyleIndexAt(startPos);
        }
-       if (startPos == 0)
-           state = SCE_H_DEFAULT;
+       state = (startPos > 0) ? styler.StyleIndexAt(startPos - 1) : SCE_H_DEFAULT;
    }
    styler.StartAt(startPos);
zufuliu commented 1 day ago

PHP-288-1116.patch Updated patch:

  1. Remove redundant styler.StartAt(startPos);.
  2. Merge InTagState(state) and isPHPStringState(state) backtracking loops.
  3. Respect isPHPScript after backtracking.

After these changes, initStyle & 0xff; seems is not needed.

```diff diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx index 51cac259..b384368d 100644 --- a/lexers/LexHTML.cxx +++ b/lexers/LexHTML.cxx @@ -668,6 +668,10 @@ constexpr bool isPHPStringState(int state) noexcept { (state == SCE_HPHP_COMPLEX_VARIABLE); } +constexpr bool StyleNeedsBacktrack(int state) noexcept { + return InTagState(state) || isPHPStringState(state); +} + enum class AllowPHP : int { None, // No PHP PHP, // 0) && (InTagState(styler.StyleIndexAt(startPos - 1)))) { + // PHP string can be heredoc, must find a delimiter first. Reread from beginning of line containing the string, to get the correct lineState + if (StyleNeedsBacktrack(state)) { + while ((startPos > 0) && (StyleNeedsBacktrack(styler.StyleIndexAt(startPos - 1)))) { const Sci_Position backLineStart = styler.LineStart(styler.GetLine(startPos-1)); length += startPos - backLineStart; startPos = backLineStart; } - state = (startPos > 0) ? styler.StyleIndexAt(startPos - 1) : SCE_H_DEFAULT; - } - // String can be heredoc, must find a delimiter first. Reread from beginning of line containing the string, to get the correct lineState - if (isPHPStringState(state)) { - while (startPos > 0 && (isPHPStringState(state) || !isLineEnd(styler[startPos - 1]))) { - startPos--; - length++; - state = styler.StyleIndexAt(startPos); + if (startPos > 0) { + state = styler.StyleIndexAt(startPos - 1); + } else { + state = isPHPScript ? SCE_HPHP_DEFAULT : SCE_H_DEFAULT; } - if (startPos == 0) - state = SCE_H_DEFAULT; } styler.StartAt(startPos); ```