Open hoonweiting opened 3 years ago
I'm looking at systemd right now, and I'm noticing some inconsistencies between systemd, INI, and editorconfig.
These three file types generally have key = value
. INI and editorconfig considers the =
to be punctuation
, while systemd considers it an operator
. I'm not sure what =
should be for all three languages, what do you think?
These three file types also have [SectionName]
, but the tokens are a little different:
Officially, editorconfig calls them Section Headers, INI calls them stanza names, and systemd calls them sections. Regardless of the main token name, they do not have consistent aliases. I'm not very sure what to do here!
This is what I'm thinking:
[
and ]
).selector
(what INI and systemd use) or keyword
(what editorconfig uses), though retaining the existing patterns (so this alias applies only to the SectionName
part for INI and systemd, while it applies to the whole [SectionName]
for editorconfig).I'm not sure what = should be for all three languages
Hard to say. Syntactically speaking, =
is just a separator token, so punctuation. However, highlighting it as an operator would probably look better in some themes, so I can understand why someone would use operator
.
Honestly, I think both operator
and punctuation
are acceptable here. Just go with which you like best. ¯\_(ツ)_/¯
Officially, editorconfig calls them Section Headers, INI calls them stanza names, and systemd calls them sections.
That depends on who you ask. To my knowledge, there is no official spec for INI, so I don't think we have to worry about names here.
Give all of them their official names (including the [ and ]).
Hmmm. Since the headers and sections all have the same function, I would favor giving them the same name. I would call of all of them "section".
These languages aren't so complex or popular that we have to offer very granular token names, so I think it's ok to go for more generic names.
Alias them to either selector (what INI and systemd use) or keyword (what editorconfig uses), though retaining the existing patterns (so this alias applies only to the SectionName part for INI and systemd, while it applies to the whole [SectionName] for editorconfig).
Good idea.
I'm looking at AutoIt right now (docs), and I'm not very sure about this one:
This will highlight stuff like #include
and #NoTrayIcon
, but it will miss out #include-once
(which is the only directive with a hyphen in its name). Since this is about the regex rather than the token itself, I'm not very sure how to address it.
The rest of it looks fine to me though!
Huh, that's just a good old bug that needs some fixing. Do you want to make a PR, or shall I?
However, regex bugs don't have anything to do with token types, so you can mark AutoIt as done.
Hmm, I'd like to give it a shot, is this right?
pattern: /(^[\t ]*)#[\w-]+/m
Yup, that works. Thank you for taking this on.
Hmm, I'm not super sure about this one:
Ada simply calls it an attribute, and it is defined in the 2012 Language Reference Manual (page 98 on the version without change bars):
Is attr-name
suitable? Or should attr-name
(or some other standard token) be used as an alias instead?
You're right. Some name like attribute
+ attr-name
as its alias would probably be better.
@RunDevelopment I see! I'll make a PR for it.
Also, I just quickly looked through the other languages, I don't think there's any more config-type languages that don't use key
and value
, except maybe .ignore?
Though it's a slightly different question: is string
the most suitable alias?
Well, ignore (think .gitignore
) isn't really a config language like INI or .properties. Ignore just contains file (glob) patterns (plus some comments), so the key
-value
semantics of other languages don't apply to it.
is
string
the most suitable alias?
I'm not sure about "most" suitable, but it's a pretty good IMO. Entries in ignore file will be file path and file path patterns, so string
seems like a decent choice to me. Since we are talking about patterns, regex
would also be fitting.
Apart from string
and regex
, I can't think of any standard tokens that are semantically close to the entries in ignore files. Got any suggestions?
I guess selector
makes some sense in that these file paths and file path patterns are being chosen to be ignored (or not), but I agree that string
works well too! I don't think regex
is a good idea, but only because the token already allows for regex
to be contained inside.
Overall, I think keeping it as it is is alright!
I'm having some doubts about AutoHotkey 😅
First of all, with respect to the token types guarantee, I'm not sure about the following tokens and how they are classified (constant
, symbol
, and important
make sense in terms of classification though). I'll admit, I've never used AutoHotkey myself, so I might not be informed enough here, perhaps there are nuances that I am not seeing:
Secondly, based on this Alphabetical Command and Function index, it seems that there are bunch that are missing, such as ComObject()
and IsByRef()
. I don't mind going through the list and adding them, but I am not sure if they were intentionally left out or something! Do you have any idea?
I'm not sure about the following tokens and how they are classified
I've never used AutoHotkey myself
Same, but I highly doubt that AutoTrim
is a selector... It's probably safe to assume that the tokens you mentioned don't follow the rules for standard tokens.
it seems that there are bunch that are missing
They were probably added to the language after our AutoIt grammar was last updated by someone. This is the reason we typically try to avoid long list of function/type/command names.
Regarding directives (currently important
), I think we could just use /#[a-z]+\b/i
and save ourselves the work of maintaining another list.
I highly doubt that
AutoTrim
is a selector...
Same! I really have no idea which token we can use for it. I wanted to call it a function
, but it's already used. Then I thought of builtin
, but it doesn't quite fit with the existing names in builtin
. Maybe they can be combined with keyword
? (Maybe I should read the docs more to see if there's a distinction between these two groups though.)
This is the reason we typically try to avoid long list of function/type/command names.
I see your point, do you think it's possible to write another regex that can replace (some of) the lists? I'm not familiar enough with regex to answer this 😅
Regarding directives (currently
important
), I think we could just use/#[a-z]+\b/i
and save ourselves the work of maintaining another list.
I appreciate your wisdom! Would you like to make the PR for it? It would be nice to use directive
as the token name instead too, though I'm not sure what standard token it can be aliased to...
Sorry to bring up another topic! I'm looking at Erlang, and it seems pretty alright, but I have a question about the operator
token:
I'm not sure whether the keywords there should be classified as operator
or keyword
. I remember this was mentioned early on in #2849, but I'm not sure what the conclusion is. I see that there's no change to SQL (the original example that was brought up), so I suppose I don't have to update Erlang here either? What do you think?
Regarding Autohotkey:
selector
seems to contain commands, so let's call them that and use selector
as an alias (it looks good).constant
seems to contain static variables and other predefined stuff. constant
is probably an ok fit here.builtin
contains built-in functions, so calling it builtin-function
+ function
alias would probably be better than builtin
. symbol
seems to contain virtual keys (I think). So symbol
is actually be a good fit.important
: As I suggested above, directive
+ important
alias.keyword
: I think those are actually keywords. Not sure.I think I'll make a PR with that. Not sure if all of those changes are that great, but they're probably a little better than what we have now. Let's continue the discussion about this topic there.
Would you like to make the PR for it?
Done.
Sorry to bring up another topic!
Don't worry. You're a great help, so don't hesitate to ask.
I'm not sure whether the keywords there should be classified as
operator
orkeyword
.
Honestly, I'm not sure what to do about them myself... They fit into both operator
and keyword
but using both classes together doesn't work with standard tokens. I think it's okay to just leave Erlang operators as they are right now until we find a good solution.
I think it's okay to just leave Erlang operators as they are right now until we find a good solution.
Sounds great, thank you 😄
I'm currently looking at MEL (Maya Embedded Language), it seems mostly fine, but I have some questions:
MEL has flags, and they are currently aliased to operator
in Prism:
I'm not really sure if operator
is the best token for aliasing, I feel like attr-name
or property
or builtin
would be a better fit in terms of meaning. What do you think?
Next, there's code
, which highlights code that's contained within backticks. The purpose is defined here (essentially evaluates the code within it and returns a value).
I don't think italics
is a meaningful alias, although I can see how having the code italicized helps a lot in readability. As such, using italics
as an alias would allow the code to be italicized without requiring the user to have to have to modify their Prism CSS file at all. But since we're on the topic of using standard tokens appropriately, I wonder if we can give this a pass?
Finally there are built-in commands (very long list! Probably even longer than AutoHotKey!).
Prism currently calls them function
s but I think builtin
is a better fit as it is possible for users to define their own functions (or procedures, as MEL calls it). And speaking of user-defined functions, I wonder if it's possible to capture them (at least when they're being defined) with regex?
I'm looking at Flow, and I'm not sure if tag
is a good alias for type
.
I'm thinking maybe class-name
would be more suitable?
Flow also has function-variable
like JavaScript:
In JS, it's aliased to function
. I suppose we can do the same for Flow?
I'm looking at Excel formula:
I wonder if function
or builtin
would be a better fit than function-name
?
keyword
also makes some sense though.
I also see why range
and cell
have the same alias, property
:
But perhaps selector
or variable
would be a better fit?
Also, as a note, I haven't really been paying attention to the colours, so there's a chance that the tokens will end up having the same colours in some themes 😅
I'm not really sure if
operator
is the best token for aliasing, I feel likeattr-name
orproperty
orbuiltin
would be a better fit in terms of meaning. What do you think?
I agree, attr-name
or property
would be better.
Next, there's
code
Yeah, italics
is weird. I mean, it does highlight code sections well, but I wonder if we could achieve the same result with more convential aliases.
Finally there are built-in commands (very long list! Probably even longer than AutoHotKey!).
Oof. 16kb of built-in commands... Yes, builtin
is a better name.
And speaking of user-defined functions, I wonder if it's possible to capture them (at least when they're being defined) with regex?
Procedure calls will be very difficult to detect because the parenthesis are optional. Procedure definition might be doable, depends on the return type. Assuming that the (optional) return type is only a single word, { pattern: /(\bproc\s+(?:\w+\s+)?)[a-zA-Z_]\w*(?=\s*\()/, lookbehind: true }
will work.
I'm looking at Flow, and I'm not sure if
tag
is a good alias fortype
.
Definitely class-name
, you're right.
Flow also has
function-variable
like JavaScript:
Flow extends JS and changes the regex. So it also inherits the function
alias from JS.
I'm looking at Excel formula: I wonder if
function
orbuiltin
would be a better fit thanfunction-name
?
I think I did this because Excel only has built-in functions and keyword
as still available. You're right, builtin
is better.
But perhaps
selector
orvariable
would be a better fit?
They are essentially just the names/indices of cells in a table. So I don't think variable
fits. I do like selector
though.
Also, as a note, I haven't really been paying attention to the colours
Don't worry about it. You win some, you lose some. Having a consistent standard is more important that some languages not looking their best in some themes.
I wonder if we could achieve the same result with more convential aliases.
I see! I'm not sure what's a good one though, I think italics
works because it's the token that is closest to guaranteeing font-style: italic
. Maybe we'll think of something soon!
16kb of built-in commands...
I was close to offering to update the list, but I'm not sure now :P I was wondering if we could use a pattern instead, but looking at the snippets earlier, I highly doubt that it's possible.
Procedure calls will be very difficult to detect...
I see! I think the function definition would be good enough, consistent with other languages. I presume you'll make a PR for the new regex?
Flow extends JS and changes the regex. So it also inherits the
function
alias from JS.
Oh right, thanks for catching that! :)
I'll make the PR for Excel and Flow shortly!
I was wondering if we could use a pattern instead, but looking at the snippets earlier, I highly doubt that it's possible.
Hmmm, maybe. I'll look into it. Since built-in commands/functions and user-defined procs use the same calling syntax, it will detect both, though.
Edit: Yup, worth it. Mel got 90% lighter. #3393
Wow, that's amazing!!! I am once again impressed by your regex skills :D
This is a tracking issue for the second half of #2849!
Long long list of languages ahead: