Remillard / VHDL-Mode

A package for Sublime Text that aids coding in the VHDL language.
MIT License
40 stars 10 forks source link

VHDL syntax validation #104

Closed lenigoor closed 4 years ago

lenigoor commented 6 years ago

First I would like to express my thanks for making this plugin. While writing some VHDL code I found that the following snippet is incorrectly highlighted. Maybe this can be fixed?

image

Sv3n commented 6 years ago

The match rule on https://github.com/Remillard/VHDL-Mode/blob/master/Syntax/VHDL.sublime-syntax#L1944 is too broad: it eats the opening bracket as the non-word character before the tick char, preventing the basic-paren-group from matching properly:

  character-literal:
    # Any of the printable characters in the standard character set can be
    # used.  Used the non-word character element to protect against
    # single quote used for attributes (e.g. a'range)
    - match: '[^a-zA-Z0-9]('')'

A possible fix:

  character-literal:
    # Any of the printable characters in the standard character set can be
    # used.  Used the non-word character element to protect against
    # single quote used for attributes (e.g. a'range)
    - match: '[^a-zA-Z0-9\(\)]('')'

(I was staring at a similar issue coincidentally)

lenigoor commented 6 years ago

That works for me, thanks!

Remillard commented 6 years ago

Thanks! I'll get that staged.

Remillard commented 6 years ago

In the process of making changes, I started experimenting a bit. Looks like it's wholly a problem with the single quoted literal. Note on adder2 the first parenthetical group works fine. If you add/subtract spaces on adder1 you can make it pass/fail. Indeed the character literal is catching the opening paren there though. I think it's the fact that I have ('') in there but trying to remember for the life of me why I put that in to begin with as I don't think a character literal can start that way. Still investigating.

Remillard commented 6 years ago

Oh wait, I'm trying to capture the opening ' mark. However I think the parens are confusing things. I think it's just a regex issue.

Remillard commented 6 years ago

Aw hell, it's been too long since I've stared religiously at regex expressions. I realize what I was doing originally and realize what Sven's solution does. I'm trying to separate expressions that also have single-quotes in it from character literals like a'range from a character literal '0'. The fact that a paren grouping could appear before a literal would get consumed by the "not". The syntax engine does not allow look-behinds.

However the proposed solution does not actually solve anything, as can be seen in the difference in spacing between adder1 and adder2. I'll put in a little screenshot.

image

Note that with the color scheme, the character literals in adder1 are purple, the characters in adder2 are ... coral? I don't know, I'm not good with color names. In any event it's a signal that the scoping is incorrect, and indeed it's not actually capturing the opening single-quote correctly.

In adder1 the second literal is scoped as: vhdl-mode: source.vhdl meta.block.architecture.body.vhdl meta.statement.assignment.signal.vhdl meta.group.parens.vhdl string.quoted.single.vhdl punctuation.definition.string.begin.vhdl In adder2 the second literal is scopes as: vhdl-mode: source.vhdl meta.block.architecture.body.vhdl meta.statement.assignment.signal.vhdl meta.group.parens.vhdl note that it didn't pick up the string.quoted.single.vhdl.

Basically I need a non-consuming non-letter check. My solution consumes a character which is why the parenthesis doesn't work properly. Or possibly parenthesis needs to be higher. If the parens gets consumed first then maybe the string literal cannot consume it. Currently strings is listed in the prototype section of the syntax definition. Prototype is sort of an include for every context so that you don't have to explicitly list them. However prototype also comes FIRST, and I might needs a post-prototype.

I will see if there's a solution for this. I really do not want to add the strings context to EVERY other context in the file at the end, but that might have to be the solution. Or I need a non-consuming non-alphanumeric check.

Remillard commented 6 years ago

Okay, after discussing this with Thom (kind of a syntax engine guru) on the Discord channel, I think I've pieced together what probably needs to be done. It's actually kind of subtle so I'll explain.

So, identifiers in VHDL can be almost anything and they tend to match everything, so I have them at a very low priority in matching contexts. Additionally, "attribute modified identifiers" are not captured atomically. Basically a word falls through and gets marked variable.other.vhdl and then the attribute falls through and gets marked keyword.other.vhdl. The attribute punctuation doesn't get scoped.

However, this means that there's a free punctuation that has escaped! The attribute punctuation is sort of a free radical and then can match character literal. So to fix that problem I put in the negation group which then unfortunately captured lenigoor's and Sven's opening parenthesis as well. So my fix to the problem of attributes caused this problem.

So... I think (unfortunately) the solution is to fix attribute modified identifiers properly. I cannot just let them fall through to identifier and keyword, but I have to identify them as an atomic whole.

Long story short, this is going to be a tricky fix. I think I know what needs to be done (create a special context group for this) however there's a lot of testing to do to make sure it doesn't break anything else.

Remillard commented 6 years ago

And single quote is used by type casting too. This is a really tricky problem.

Remillard commented 5 years ago

This actually ended up being a relatively simple fix actually. Looking backwards cannot be done well with the syntax engine for speed. It is actually possible but Sublime switched to a much slower regular expression engine when you have look-behinds. However I'm trying to find single character literals which is a pretty simple construct. Instead of using look-behinds, a look-forward for a single character and quote mark seems to do the trick. This does not consume the ' for either attributes or for typecasting and does not rely on the preceding character not being a alphanumeric.

Remillard commented 5 years ago

image

Since some screenshots had been shared before, thought I'd throw this in there. I'm copying @lenigoor 's original code. The parentheses are captured, and should be noted that the single character capture is space agnostic. Additionally, the attribute tick is honored, as is the string cast.

Remillard commented 5 years ago

There exists a case with a typecast bit that fools this. std_logic'('0')