ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
176 stars 65 forks source link

Simplify `WordListSet()` method #103

Open zufuliu opened 1 year ago

zufuliu commented 1 year ago

Following code is used in many lexers (can be found with ripgrep rg -C 2 wlNew.Set or grep -nr -C 2 wlNew.Set on lexers folder), https://github.com/ScintillaOrg/lexilla/blob/3dbbff6fe2fa53b8d7a3907440e6f8294f8b1eb4/lexers/LexCPP.cxx#L734-L740

Can be simplified to

    if (wordListN) {
        if (wordListN->Set(wl)) {
            firstModification = 0;
            if (n == 4) {

to avoid call the expensive WordList::Set() method twice (e.g. when setup a lexer).

zufuliu commented 1 year ago

I'm sorry, this is same as https://sourceforge.net/p/scintilla/feature-requests/1305/, so can be marked as duplicate.

rdipardo commented 1 year ago

this is same as https://sourceforge.net/p/scintilla/feature-requests/1305/

It looks rather like this comment was recast as an independent issue.

It may be worth considering since there are now more lexers using the alleged anti-pattern than existed when the original feature request was resolved. Geany, for example, will be largely affected as it consumes both LexGDScript and LexJulia, with plans to include LexRaku.

nyamatongwe commented 1 year ago

There are no automated checks to ensure that ILexer5::WordListSet is implemented correctly by lexers.

rdipardo commented 1 year ago

There are no automated checks to ensure that ILexer5::WordListSet is implemented correctly by lexers.

I would say the assessment of any performance impact falls on the downstream application. Most causes of inefficiency are the fault of the application anyway. They simply can't all be as finely tuned as Notepad2.

There was one other comment on the original thread suggesting that mass adoption of DefaultLexer in favour of "old stateless ColouriseDoc/FoldDoc lexers" had resulted in object bloating. I wanted to see if the simplified implementation of ::WordListSet translated into fewer instructions. The results were too mixed to recommend it.

The modules that grew in size might perform faster, if we assume that instruction count and speed are generally in inverse proportion. The gain would be immeasurably small in any case.

$ cd src && make
$ git grep -l wlNew.Set -- ../lexers |\
   awk -F'/' '/lexers/ {print $3}' | sed 's/\.cxx//g' |\
   xargs -I% ls -lG %.o | awk '{print $4 "\t" $8}'

# [before]
45624   LexAsm.o
59496   LexBaan.o
65328   LexBash.o
50888   LexBasic.o
35232   LexCIL.o
150104  LexCPP.o
47832   LexD.o
61304   LexGDScript.o
112392  LexHTML.o
50560   LexHaskell.o
37488   LexHollywood.o
43848   LexJSON.o
51688   LexJulia.o
43096   LexNim.o
66824   LexPerl.o
41016   LexProgress.o
70160   LexPython.o
54928   LexRaku.o
48808   LexRust.o
50552   LexSQL.o
95552   LexVerilog.o
40032   LexVisualProlog.o

# [after]
46352   LexAsm.o            +1.60%
59464   LexBaan.o           -0.05%
65184   LexBash.o           -0.22%
51880   LexBasic.o          +1.95%
35264   LexCIL.o            +0.09%
149864  LexCPP.o            -0.16%
48616   LexD.o              +1.64%
61936   LexGDScript.o       +1.03%
112040  LexHTML.o           -0.31%
50848   LexHaskell.o        +0.57%
37512   LexHollywood.o      +0.06%
43352   LexJSON.o           -1.13% greatest decrease
51992   LexJulia.o          +0.59%
43136   LexNim.o            +0.09%
66472   LexPerl.o           -0.53%
41024   LexProgress.o       +0.02%
69808   LexPython.o         -0.50%
54544   LexRaku.o           -0.70%
49800   LexRust.o           +2.03% greatest increase
50704   LexSQL.o            +0.30%
95224   LexVerilog.o        -0.34%
40256   LexVisualProlog.o   +0.56%

$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
# CFLAGS: -DNDEBUG --std=c++17 -fPIC -Os -Wpedantic -Wall -Wextra
zufuliu commented 1 year ago

There are no automated checks to ensure that ILexer5::WordListSet is implemented correctly by lexers.

I think the check can be added by mark WordList::Set() as [[nodiscard]], then watch compiler warnings: MSVC: warning C4834: discarding return value of function with 'nodiscard' attribute.

GCC: warning: ignoring return value of 'bool Lexilla::WordList::Set(const char*)', declared with attribute 'nodiscard' [-Wunused-result].

Clang: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]

attached the patch to add [[nodiscard]] attribute and suppress the warning with (void) cast in testWordList.cxx. WordList_Set_nodiscard.zip

zufuliu commented 5 months ago

[[nodiscard]] attribute is used in more places, above patch could be applied now?

nyamatongwe commented 5 months ago

There are no tests for the baan, blitzbasic, purebasic, freebasic, cil, haskell, literatehaskell, or hollywood lexers which use WordList::Set(). Without tests, these lexers should not be modified. Therefore WordList::Set() can not be [[nodiscard]] as this would produce warnings from these lexers.

nyamatongwe commented 5 months ago

Made this change to 7 lexers that have tests and are in common use.