alecthomas / chroma

A general purpose syntax highlighter in pure Go
MIT License
4.31k stars 393 forks source link

Get `err` class in Chroma output for new lexer #165

Closed Jos512 closed 6 years ago

Jos512 commented 6 years ago

I've been working on adding a new lexer, but when testing I get err classes added by Chroma for each non-matched character:

<span class="c1"></span><span class="nf">study</span><span class="err">(</span><span class="err">t</span><span class="err">i</span><span class="err">t</span><span class="err">l</span><span class="err">e</span><span class="err">=</span><span class="s">&#34;Bollinger bands alerts [indicator]&#34;</span><span class="p">,</span> <span class="err">o</span><span class="err">v</span><span class="err">e</span><span class="err">r</span><span class="err">l</span><span class="err">a</span><span class="err">y</span><span class="err">=</span><span class="ow">true</span><span class="err">)</span>

I understand from the Pygments doc that when 'if no rule matches at the current position, the current char is emitted as an Error token that indicates a lexing error'. However, I do have whitespace matching rules for the Text token. From how I understand Chroma and the already implemented lexers, that should have Chroma skip those matches.

Can someone give me a pointer or suggestion? I'm going in circles with my own attempts and ideas.

alecthomas commented 6 years ago

It looks like "(" is the erroring token, and looking at the code I don't see a pattern for that?

Jos512 commented 6 years ago

It looks like "(" is the erroring token, and looking at the code I don't see a pattern for that?

Thanks for the quick reply and the good catch! That one was indeed still missing. Unfortunately there are error tokens still left.

If I input this:

study(title="Bollinger bands alerts [indicator]", overlay=true)

The lexer should turn that into:

Code element Token
study NameFunction
"Bollinger bands alerts [indicator]" LiteralString
, Punctuation
true OperatorWord

But I get (formatted for readability):

<span class="nf">study</span>
<span class="err">(</span>
<span class="err">t</span>
<span class="err">i</span>
<span class="err">t</span>
<span class="err">l</span>
<span class="err">e</span>
<span class="err">=</span>
<span class="s">&#34;Bollinger bands alerts [indicator]&#34;</span>
<span class="p">,</span>
<span class="err">o</span>
<span class="err">v</span>
<span class="err">e</span>
<span class="err">r</span>
<span class="err">l</span>
<span class="err">a</span>
<span class="err">y</span>
<span class="err">=</span>
<span class="ow">true</span>
<span class="err">)</span>

If I update my Text word lexer to include any word character (\w) to [^\S\n\(\)\w]+ I get the same errors.


For what it's worth, the lexer test also fails. Unfortunately on Windows I get odd characters in the output (←[42m, ←[1m, ←[0m), which I take are meant to show colours but don't show as such in my console.

   --- FAIL: TestLexers/TradingView (0.02s)
        Error Trace:    1:
        Error:          Not equal: []*chroma.Token←[1m...←[0m, "//@version=3←[42m\r\n←[0m"}, &Token{NameFunction, "study←[42m"}, &Token{Error, "(title←[0m"}, &Token{Keyw←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "overlay"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "precisio"}, &Token{NameVariable, "n"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, "3"}, &Token{←[42mError, ")"}, &Token{←[0mText,
"\←[42mr"}, &Toke←[0mn←[42m{Error, "←[0m\n"}, &Token{←[42mText, "\r"}, &Token{Error, "\nemaLe"}, &Token{NameVariable, "n"}, &Token{Text, " "}, &Token{←[0mKeywordPseudo, "="}, &Token{←[42mText, " "}, &Token{←[0mNameFunction,
"input"}, &Token{←[42mError, "(title"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "type"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "defval"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m "10"}, &Token{←[42mError, ")"}, &Token{Text, "\r"}, &Token{Error, "\nemaVal"}, &Token{Text, " "}, &Token{←[0mKeywordPseudo, "="}, &Token{←[42mText, " "}, &Token{←[0mNameFunction, "ema←[42m"}, &Token{Error, "(←[0m"}, &Token{Name←[1m...←[0m &Token{Text, "←[42m "}, &Token{Error, "emaLe"}, &Token{NameVariable, "n"}, &Token{Error, ")"}, &Token{Text, "←[0m\←[42mr"}, &Toke←[0mn←[42m{Error, "\n"}, &Token{Text, "\r"}, &Token{Error, "←[0m\n"}, &Token{Na←[1m...←[0mplot"}, &Token{←[42mError, "(series"}, &Token{←[0mKeywordPseudo, "="}, &Token{←[42mError, "emaVal"}, &Token{←[0mPunctuation, ","}, &Token{←[42mText, " "}, &Token{Error, "style"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "offset"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, ","}, &Token{←[42mText, " "}, &Token{Error, "linewidth"}, &Token{←[0mKeywordPseudo, ←[1m...←[0m, "3"}, &Token{←[42mError, ")"}, &Token{←[0mText, "\←[42mr"}, &Toke←[0mn←[42m{Error, "\n"}, &Token{Text, "\r"}, &Token{Error, "←[0m\n"}, &Token{Co←[1m...←[0mlour background←[42m\r\n←[0m"}, &Token{Name←[1m...←[0molor"}, &Token{←[42mError, "(col"}, &Token{OperatorWord, "or"}, &Token{←[0mKeywordPseudo, ←[1m...←[0mNameVariable, "←[41m\n    ←[0mclose"}, &Token{←[42mText, " "}, &Token{←[0mPunctuation, ">"}, &Token{←[42mText, " "}, &Token{←[0mNameVariable, "open"}, &Token{←[42mText, " "}, &Token{←[0mPunctuation, "?"}, &Token{←[42mText, " "}, &Token{←[0mNameVariable, "orange"}, &Token{←[42mText, " "}, &Token{←[0mPunctuation, ":"}, &Token{←[42mText, "\r"}, &Token{Error, "\n"}, &Token{Text, "     "}, &Token{←[0mNameVariable, "close"}, &Token{←[41mOp←[0m←[42mText, " "}, &Tok←[0me←[41mr←[0m←[42mn{Punctu←[0mat←[42mion, "!"}, &Token{Keyw←[0mor←[42mdPseudo←[0m, "←[41m!←[0m=←[42m"}, &Token{Text, " ←[0m"}, &Token{Name←[1m...←[0m, "]"}, &Token{←[42mText, " "}, &Token{←[0mPunctuation, "?"}, &Token{←[42mText, " "}, &Token{←[0mNameVariable, "purple"}, &Token{←[42mText, " "}, &Token{←[0mPunctuation, ":"}, &Token{←[41mNam←[0m←[42mT←[0me←[41mVa←[0m←[42mxt, "\←[0mr←[41miabl←[0m←[42m"}, &Tok←[0me←[42mn{Error←[0m, "\n←[42m"},←[0m ←[42m&Token{Text, "←[0m   ←[42m  "}, &Token{Error, "←[0mna"}, &Token{Punctuation, ","}, &Token{←[42mText, " "},
&Token{Error, "transp"}, &Token{←[0mKeywordPseudo, ←[1m...←[0meralNumber, "80←[42m"}, &Token{Error, ")←[0m"}}
Jos512 commented 6 years ago

As a quick update, it seems I've found the problem: I had a missing Name token:

{`@?[_a-zA-Z]\w*`, Text, nil},

With this line the err message seem to go away.


(It might be too soon to definitely close this PR since I need to do more testing and validating. But I wanted to make a quick comment in case someone else is going to spend time on it, which I wouldn't be comfortable with knowing I've 'wasted' someone's free time.)

Edit: The issue seems to be fixed. Now I can wrestle with the regex issues. :slightly_smiling_face: