ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
163 stars 59 forks source link

[HTML] Add separate keywords for attributes #251

Open zufuliu opened 1 month ago

zufuliu commented 1 month ago

Created for https://sourceforge.net/p/scintilla/bugs/514/, copy my last post:

I think a simple fix is just split attribute list from tag list keywords (this is what Notepad4 did since 2018):

case 6:
    wordListN = &keywordsAttr;
    break;

Also, those keywords<n> fields needs better describable names.

Application wants better support (handle attributes for each tag) can opt-in new sub-style classifiers added in Lexilla 5.3.2, and using keywordsAttr for global attributes (see https://html.spec.whatwg.org/multipage/dom.html#global-attributes).

nyamatongwe commented 1 month ago

split attribute list from tag list

That change would not be completely compatible leading to unhighlighted attributes.

zufuliu commented 1 month ago

I think it's a compatible change: if app set keywordsAttr (word list 6) to non-empty, we know it want to using separator tag list and attribute list, and then using it instead of word list 0 (tag list) for classifyAttribHTML(). A more compatible change would let app to set which word list is attribute list instead of hardcoded 6. Lightweight app can put all attributes and event handlers into the word list similar to Notepad4 did, e.g. SciTE would need to update:

keywordclass.hypertext=\
$(hypertext.elements) $(hypertext.attributes) $(html5.elements) $(html5.attributes) public !doctype

# attributes
keywords7.$(file.patterns.xxx)=
zufuliu commented 1 month ago

Here are two patches to fix this: html-0604-1.patch

```diff diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx index 0bff582d..6191e99e 100644 --- a/lexers/LexHTML.cxx +++ b/lexers/LexHTML.cxx @@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer { WordList keywords4; WordList keywords5; WordList keywords6; // SGML (DTD) keywords + WordList keywords7; + const WordList *attributeList; OptionsHTML options; OptionSetHTML osHTML; std::set nonFoldingTags; @@ -1019,6 +1021,7 @@ public: isXml_ ? std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)), isXml(isXml_), isPHPScript(isPHPScript_), + attributeList(&keywords), osHTML(isPHPScript_), nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) { } @@ -1115,6 +1118,10 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) { case 5: wordListN = &keywords6; break; + case 6: + wordListN = &keywords7; + attributeList = wl[0] ? wordListN : &keywords; + break; default: break; } @@ -2022,7 +2029,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int break; case SCE_H_ATTRIBUTE: if (!setAttributeContinue.Contains(ch)) { - isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag); + isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag); if (ch == '>') { styler.ColourTo(i, SCE_H_TAG); if (inScriptType == eNonHtmlScript) { diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties index e3aa3c28..eb53d607 100644 --- a/test/examples/hypertext/SciTE.properties +++ b/test/examples/hypertext/SciTE.properties @@ -1,7 +1,6 @@ lexer.*=hypertext -# Tags and attributes -keywords.*=b br body content encoding head href html img language li link meta \ -name p rel runat script src strong title type ul version xml xmlns +# Tags +keywords.*=b br body head html img li link meta p script strong title ul xml # JavaScript keywords2.*=function var # Basic @@ -12,6 +11,8 @@ keywords4.*=import pass keywords5.*=echo __file__ __line__ # SGML keywords6.*=ELEMENT +# Attributes +keywords7.*=content encoding href language name rel runat src type version xmlns # Tag substyles.hypertext.1=1 ```

html-0604-2.patch

```diff diff --git a/lexers/LexHTML.cxx b/lexers/LexHTML.cxx index 0bff582d..bbb5fbbc 100644 --- a/lexers/LexHTML.cxx +++ b/lexers/LexHTML.cxx @@ -1006,6 +1006,8 @@ class LexerHTML : public DefaultLexer { WordList keywords4; WordList keywords5; WordList keywords6; // SGML (DTD) keywords + WordList keywords7; + const WordList *attributeList; OptionsHTML options; OptionSetHTML osHTML; std::set nonFoldingTags; @@ -1019,6 +1021,7 @@ public: isXml_ ? std::size(lexicalClassesXML) : std::size(lexicalClassesHTML)), isXml(isXml_), isPHPScript(isPHPScript_), + attributeList(&keywords), osHTML(isPHPScript_), nonFoldingTags(std::begin(tagsThatDoNotFold), std::end(tagsThatDoNotFold)) { } @@ -1115,6 +1118,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) { case 5: wordListN = &keywords6; break; + case 6: + wordListN = &keywords7; + break; default: break; } @@ -1122,6 +1128,9 @@ Sci_Position SCI_METHOD LexerHTML::WordListSet(int n, const char *wl) { if (wordListN) { if (wordListN->Set(wl)) { firstModification = 0; + if (n == 6) { + attributeList = wordListN->Length() ? wordListN : &keywords; + } } } return firstModification; @@ -2022,7 +2031,7 @@ void SCI_METHOD LexerHTML::Lex(Sci_PositionU startPos, Sci_Position length, int break; case SCE_H_ATTRIBUTE: if (!setAttributeContinue.Contains(ch)) { - isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, keywords, classifierAttributes, styler, lastTag); + isLanguageType = classifyAttribHTML(inScriptType, styler.GetStartSegment(), i - 1, *attributeList, classifierAttributes, styler, lastTag); if (ch == '>') { styler.ColourTo(i, SCE_H_TAG); if (inScriptType == eNonHtmlScript) { diff --git a/test/examples/hypertext/SciTE.properties b/test/examples/hypertext/SciTE.properties index e3aa3c28..eb53d607 100644 --- a/test/examples/hypertext/SciTE.properties +++ b/test/examples/hypertext/SciTE.properties @@ -1,7 +1,6 @@ lexer.*=hypertext -# Tags and attributes -keywords.*=b br body content encoding head href html img language li link meta \ -name p rel runat script src strong title type ul version xml xmlns +# Tags +keywords.*=b br body head html img li link meta p script strong title ul xml # JavaScript keywords2.*=function var # Basic @@ -12,6 +11,8 @@ keywords4.*=import pass keywords5.*=echo __file__ __line__ # SGML keywords6.*=ELEMENT +# Attributes +keywords7.*=content encoding href language name rel runat src type version xmlns # Tag substyles.hypertext.1=1 ```

html-0604-1.patch set attribute list when parameter is not an empty string (decouple from tag list without set anything):

attributeList = wl[0] ? wordListN : &keywords;

html-0604-2.patch set attribute list only when parameter contains word:

if (n == 6) {
    attributeList = wordListN->Length() ? wordListN : &keywords;
}
nyamatongwe commented 1 month ago

This appears to duplicate behaviour available from substyles added with 1c6e3e5 so isn't necessary.

zufuliu commented 1 month ago

I don't think it's duplicate. Even with substyles, body is still classified as SCE_H_ATTRIBUTE: https://github.com/ScintillaOrg/lexilla/blob/85d1d679d442dc4faaf4c73b5594f805499f8313/lexers/LexHTML.cxx#L259-L269

So, there need a way to not use keywords (word list 0) to classify attributes. The patch provided a method that's easy for app to migration than implement substyles.

zufuliu commented 1 month ago

I don't think it's duplicate. Even with substyles, body is still classified as SCE_H_ATTRIBUTE: https://github.com/ScintillaOrg/lexilla/blob/85d1d679d442dc4faaf4c73b5594f805499f8313/lexers/LexHTML.cxx#L259-L269

So, there need a way to not use keywords (word list 0) to classify attributes. The patch provided a method that's easy for app to migration than implement substyles.