Open hackerb9 opened 1 year ago
Possibly of use: I see that the QuickBasic manual breaks its statements into the following groups:
I do not know if these categories overlap or if they have any relationship to what is actually used in syntax highlighting systems. I will say that I think it is a very good feature that basic-mode makes control-flow distinct.
Update: after reading through that chapter, I think it actually contains some very useful divisions for syntax highlighting and it is not too different from how basic-mode currently works. In particular, Control-Flow (GOTO, IF...THEN, FOR..NEXT) is fundamental to BASIC programmers and is already prominently shown.
Less common, but perhaps even more important to shine a spotlight on are the Trapping Statements (ON ERR GOTO...) which can change a program's flow of control asynchronously. (For example, ON COM GOSUB
is used on the Tandy 200 to run a BASIC routine when data comes in on the serial port.) Currently traps are shown the same as control-flow in basic-mode, but I suggest that they ought to be even more vivid.
Statements Used in Procedures (SUB, FUNCTION, DEF FN) are also currently shown in the same face as Control-Flow, but if there are any spare font-lock categories left, it may look better if they were subtly different.
I do not have problem with mushing together Standard I/O and File I/O and Graphics Statements, but I'm not opposed to keeping them separate as the QuickBasic manual has them either. String-Processing Functions do make some sense to show with a slight variation from normal statements, but I do not know if it is worth going out of the way to do.
Samples of other syntax highlighting for BASIC:
Github's MarkDown attempts to syntax highlight fenced code blocks. For example:
```BASIC
0 DEFINT X=32768, Y=RND(-1) ' SALT
10 PRINT "HELLO, WORLD! "; TIME$;
20 Y=(31*Y+PEEK(X)) MOD 257: REM PRIME HASH
30 X=X+1: GOSUB 10
Becomes,
```BASIC
0 DEFINT X=32768, Y=RND(-1) ' SALT
10 PRINT "HELLO, WORLD! "; TIME$;
20 Y=(31*Y+PEEK(X)) MOD 257: REM PRIME HASH
30 X=X+1: GOSUB 10
We use Linguist to perform language detection and to select third-party grammars for syntax highlighting. You can find out which keywords are valid in the languages YAML file.
Github actually relies on telnet23's language-basic syntax highlighting. Like basic-mode.el, it properly marks control-flow as being important. Unfortunately, the way Github has used it, control-flow and operators are both shown in red, so in practice, control-flow is not distinct.
One feature it has that basic-mode.el should steal acquire is highlighting line numbers when they are referenced by statements like GOTO 10
in the line number color.
The main categories appear to be BASIC's typical functions, statements, and operators, but interestingly, it separates string functions from other functions. I suppose the reasoning is that it shows the type the function is returning. Is that an idea that would be helpful to programmers?
The regexps are succinctly defined in .CSON format:
Just for reference, here are the categories into which Bill Crider's “BASIC Programming Conversion” puts the reserved words from several dialects of BASIC. Actually, "tags" would be a better description, as each word can fit in multiple boxes.
Crider says that he is covering BASIC for "Apple, IBM PC, IBM PCjr, Commodore 64, TRS-80 Model III, and TRS-80 Color Computer". However, I noticed that his list of all 551 reserved words one should avoid using in identifiers as they are defined in one BASIC or another actually includes more keywords than were listed in the categories above. They appear to be from Atari BASIC (e.g., PTRIG
) and Tektronix 4051 Graphic System BASIC Language (e.g., BAPPEN
).
Statements Used in Procedures (SUB, FUNCTION, DEF FN) are also currently shown in the same face as Control-Flow, but if there are any spare font-lock categories left, it may look better if they were subtly different.
It is also possible to create new font-lock categories. I don't know if that is considered good style though. One could do something like this:
(defface font-lock-operator-face
'((((class grayscale) (background light)) :foreground "Gray90" :weight bold)
(((class grayscale) (background dark)) :foreground "DimGray" :weight bold)
(((class color) (min-colors 88) (background light)) :foreground "ForestGreen")
(((class color) (min-colors 88) (background dark)) :foreground "PaleGreen")
(((class color) (min-colors 16) (background light)) :foreground "ForestGreen")
(((class color) (min-colors 16) (background dark)) :foreground "PaleGreen")
(((class color) (min-colors 8)) :foreground "green")
(t :weight bold :underline t))
"Font Lock mode face used to highlight operators."
:group 'font-lock-faces)
(defvar font-lock-operator-face 'font-lock-operator-face
"Face name to use for operators.")
One feature it has that basic-mode.el should
stealacquire is highlighting line numbers when they are referenced by statements likeGOTO 10
in the line number color.
This is a good idea. But maybe it should be a separate issue, because it has nothing to do with highlighting categories?
It is also possible to create new font-lock categories. I don't know if that is considered good style though.
I like that idea. I also don't know about the style guidelines, but I think it should be fine since it would only affect BASIC mode.
Creating new categories and aliasing the existing ones (functions/keywords/builtins) would allow categories that make sense for BASIC. I think the QBASIC manual is a good starting point, but I'd suggest merging some of the similar categories:
Of course, the categories from that particular appendix are not exhaustive as they do not cover the more day-to-day categories:
What are the conventions for syntax highlighting of BASIC code? While working on issue #20 (derived modes), I have run into a problem that I do not know the meaning of groupings like basic-builtin-regexp and basic-keyword-regexp.
If there is a "typical" convention, it would be helpful to have it in the comments in basic-mode.el. If there is not, and I suspect there isn't yet, it may be good to see what others have done and look at what makes sense.
Note that not all of the categories are confusing. Comments, constants, strings, and so on are self-explanatory. The ones that I'd like to get nailed down are:
SIN()
) which return a value. The definition of it makes sense, but there is some question about whether things likePEEK()
andTIME$
belong. And is this distinction even helpful to a programmer?AS
andRANDOMIZE
. And wouldn't it make more sense for data type declarations likeDEFINT
to be highlighted as a type?PRINT
,PEEK
,POKE
, etc. But there are several counter examples to that idea.DATA
andLET
seem to be more structural.AND
,MOD
,NOT
,OR
, andXOR
are operators and I would have expected them to be highlighted differently.