ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
179 stars 64 forks source link

LexerVerilog tries to set line state beyond EOF #205

Closed rdipardo closed 1 year ago

rdipardo commented 1 year ago

After looking into #204, I found that LexerVerilog::Lex increments the line counter beyond the end of the document before passing it to LexAccessor::SetLineState:

https://github.com/ScintillaOrg/lexilla/blob/12348921861f02130d77beea9b1666531d29081d/lexers/LexVerilog.cxx#L475-L482

TestLexers will abort when the lexer tries to access the last line + 1, even if the input is a zero-length buffer:

Lexing verilog\0.vh
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)

This patch enables testing by adding a range check and modernizing one occurrence of StyleContext::atLineEnd to ::MatchLineEnd():

0001-LexerVerilog-Avoid-incrementing-the-current-line-past-the-end.patch.txt

These tests currently fail because of DOS EOL splitting (which global usage of ::MatchLineEnd() does not resolve), but they should be good enough to probe what's happening:

LexerVerilog-prelim-tests.zip

nyamatongwe commented 1 year ago

SetLineState in Scintilla allows setting lines past the end to make it simpler to implement lexers. TestLexers is a bit stricter to help uncover problems but its probably reasonable to allow writing just one past the maximum line. Possibly

@@ -196,6 +196,9 @@ int SCI_METHOD TestDocument::GetLineState(Sci_Position line) const {
 }

 int SCI_METHOD TestDocument::SetLineState(Sci_Position line, int state) {
+   if (line == static_cast<Sci_Position>(lineLevels.size())) {
+       return 0;
+   }
    return lineStates.at(line) = state;
 }

or

@@ -123,7 +123,7 @@ void TestDocument::Set(std::string_view sv) {
    if (lineStarts.back() != Length()) {
        lineStarts.push_back(Length());
    }
-   lineStates.resize(lineStarts.size());
+   lineStates.resize(lineStarts.size() + 1);
    lineLevels.resize(lineStarts.size(), 0x400);
 }
rdipardo commented 1 year ago

Amending the TestDocument module does the trick.

After applying the second option (since they're equivalent and single exit points are ideal), we're left with the following:

Lexing verilog/AllStyles.vh
~/lexilla/test/examples/verilog/AllStyles.vh:66: different styles between \r and \n at 870: 11, 0
~/lexilla/test/examples/verilog/AllStyles.vh:67: different styles between \r and \n at 879: 11, 0
~/lexilla/test/examples/verilog/AllStyles.vh:68: different styles between \r and \n at 888: 11, 0
~/lexilla/test/examples/verilog/AllStyles.vh:69: different styles between \r and \n at 896: 11, 0
~/lexilla/test/examples/verilog/AllStyles.vh:70: different styles between \r and \n at 905: 11, 0

~/lexilla/test/examples/verilog/AllStyles.vh:1: has different styles with \n versus \r\n line ends

Modernizing all cases of StyleContext::atLineEnd to ::MatchLineEnd() resolves most of these, leaving a single issue of unclean style switching in the middle of DOS EOLs in the case of SCE_V_STRINGEOL. Because StyleContext::ForwardSetState() will only advance the width of a single character, a conditional call to ::Forward() is needed to span the entire CRLF:

https://github.com/ScintillaOrg/lexilla/blob/12348921861f02130d77beea9b1666531d29081d/lexers/LexVerilog.cxx#L596-L598

The diagnostic is not very revealing:

Lexing verilog/AllStyles.vh
~/lexilla/test/examples/verilog/AllStyles.vh:73: different styles between \r and \n at 937: 12, 0

~/lexilla/test/examples/verilog/AllStyles.vh:1: has different styles with \n versus \r\n line ends

This was one case where visual inspection was more productive. In LF mode, SCE_V_STRINGEOL would fill the line; in CRLF mode, it was trimmed to the length of the string + 1 for the severed CR.

0002-LexerVerilog-Don-t-split-CRLFs-when-terminating-string-EOL-state.patch.txt