alemuller / tree-sitter-vhdl

VHDL grammar for tree-sitter
MIT License
29 stars 9 forks source link

Queries #1

Open dstadelm opened 3 years ago

dstadelm commented 3 years ago

I'm reluctant to call this an issue but I was unable to reach out in an other way. I wanted to ask if there are plans to write any queries locals.scm/highlight.scm/indents.scm

alemuller commented 3 years ago

The queries are not applied consistently in different tools. For instance, queries written for nvim-tree-sitter behave differently on cli-tree-sitter. On cli-tree-sitter only first matching capture is highlighted. On the other hand, nvim-tree-sitter applies highlights group to all captures, with the last having highest priority. I'm not sure how useful it would be having a single query file.

I sent a PR for nvim-trees-sitter to add vhdl highlight query https://github.com/nvim-treesitter/nvim-treesitter/pull/820 but it wasn't conformant with their current spec. I working on a rewrite.

Feature request on issues are ok. I'm usually at https://nvim-treesitter.zulipchat.com if you want to have a chat.

dstadelm commented 3 years ago

Very nice! Neovim is what I care for. I see that you are really deep into the topic. I was thinking of writing queries for neovim but had the feeling that I know too little on the topic.

SethGower commented 1 year ago

I realize that this is a stale issue, but I felt it's relevant. Since the highlighting scm seems to be lacking IMO (since it only highlights errors), I am working on adding more highlight groups. You can see in my fork here

SethGower commented 1 year ago

so @alemuller I am now seeing the issues you were talking about, now that I have been poking around the code more and testing more.

What is the rationale behind the error checking for the highlighting? I feel like that would be more of a job for a linting engine (personally I use VHDL_LS.

I'm going to keep on poking at it to try and get it to work with having both the actual highlighting and the error highlighting work together.

I would be interested in contributing more, is there anyway we could discuss things?

alemuller commented 1 year ago

I tried to reduce the number of states on the parser reusing some similar rules. Back when I wrote the parser, tree-sitter had a limitation on the number of rules.

The syntax of the headers (clauses and map aspects) varies slightly depending on the declaration. Map aspects on package header shall have a trailing semi colon, but map aspects in subprogram header shall not.

But I got carried away and went too far :P. This should be done with the linter.

This is was the first parser I wrote, I would do a lot of things different nowadays.

alemuller commented 1 year ago

@SethGower, you can find me at #neovim at matrix.

jpt13653903 commented 8 months ago

@SethGower, I love your starting point... I've expanded it a bit since, and will continue to do so.

I think it's a bit premature for a pull-request, but you can follow what I'm doing here, with the version in my Neovim setup repo here. The Neovim setup repo will generally be more up to date, because I don't always commit updates directly to my fork of this one.

SethGower commented 8 months ago

@SethGower, I love your starting point... I've expanded it a bit since, and will continue to do so.

I think it's a bit premature for a pull-request, but you can follow what I'm doing here, with the version in my Neovim setup repo here. The Neovim setup repo will generally be more up to date, because I don't always commit updates directly to my fork of this one.

Awesome! Thanks for extending it. I haven't made much progress recently. I will look at yours, and start using it so I can see if I can add more to it.

Chris44442 commented 2 months ago

@SethGower @jpt13653903

Could you guys help me out here? I am a bit puzzled. It seems like tree-sitter-vhdl with the highlights.scm has almost no syntax highlighting whatsoever? I have been trying out the Helix editor recently and they use the official tree-sitter which in turn use tree-sitter-vhdl. But as it turns out, even a vanilla Neovim has better VHDL syntax highlighting? For example types such as std_logic, or strings or constants don't even get highlighted by tree-sitter-vhdl with the highlights.scm.

jpt13653903 commented 2 months ago

I have to be honest -- I have no idea what you mean... Might be something to do with your settings? It works perfectly well on my Neovim (I don't know about other editors).

I'm running my fork, which I added quite a number of additions to. I don't want to issue a pull-request and merge it into this one yet though -- both are fairly experimental. I have no idea why Helix would consider this "official". I installed it into my Neovim by means of a side-channel (in my config I'm referencing the URL directly, not in the "predefined" languages list): last time I checked there is no VHDL support out-of-the-box.

I copied a couple examples from Wikipedia into my Neovim:

image

image

image

It even highlights syntax errors:

image

Chris44442 commented 2 months ago

Yes, I have tried your fork and now it works as it should. Thanks. I had to build the grammar from source and also copy the highlights.scm file. Of course this is not all in the spirit of Helix which should have a solid out-of-the-box experience, not requiring to meddle with the tree-sitter.

Anyway this is not really Helix' fault. It is tree-sitter. Why would they merge the alemuller/tree-sitter-vhdl with this highlights.scm? It can't even do types, or constants, or strings.

Is there any fork that we can offer to them as a replacement for alemuller/tree-sitter-vhdl ?

The thing is, while I appreciate the effort, I don't think error highlighting should be done by tree-sitter. We have excellent LSPs for that.

SethGower commented 2 months ago

@Chris44442 I agree with you on the error handling being handled by tree-sitter, which is exactly the topic we have discussed in this thread, and why @jpt13653903 and myself have forked it and modified it. I agree that this could be a better experience with it working out of the box, but that isn't the case right now. If you would like to push it over the finish line, with getting things merged into the tree-sitter repo, by all means, you can open a PR there. Personally, this workflow works fairly well for me, and I don't have the time myself to do that (I am guessing @jpt13653903 is in the same boat there).

Also @jpt13653903, thanks for linking your Neovim config again, I was originally pointing nvim-treesitter to a locally cloned version and symlinking the queries into the plugin config, which wasn't super easy to work with. I didn't realize that you could have a queries/vhdl directory in the neovim rtp and it would work, so thanks for publishing that, I stole it.

jpt13653903 commented 2 months ago

Pleasure -- steal away :-)

In Neovim it's a fairly seamless experience: you plonk the URL in the config and run "TSUpdate", no extra compiling required. It is unfortunate that the experience is not as seamless in other editors. And I agree -- it makes no sense for tree-sitter to use alemuller/tree-sitter-vhdl as the default. It specifically does only the grammar (i.e. it has no queries, highlighting or otherwise), and the author has no intention of adding one (ref earlier in this thread). I can only assume that it was the best they could find at the time and they just rolled with it, and they don't have a large enough user-base to push them otherwise.

And yes -- I also agree with you regarding the error handling that shouldn't happen in tree-sitter, I just didn't feel like spending the time to remove it, and it's not doing any harm in my use-case. In fact, I didn't change or recompile the grammar at all (I didn't know how at the time I was modifying Seth's highlighting queries).

The main reason I'm not promoting my fork as anything official is because it's hacked to work with my coding style (i.e. lower-case keywords only, etc.). It does not conform to the standard in full. And I have no plan to make it anything more than what I personally need -- same as Seth.

Best of luck -- you'll be doing the Helix community a huge favour if you get tree-sitter to use a better VHDL parser with proper queries.

Chris44442 commented 2 months ago

Thanks for the info guys, it helped me understand what is going on. I probably should've dived a bit deeper before being so critical. I think the root of my problem was that the guys over at Helix have tried using the highlights.scm file from this repo, and it errored (I guess that is what this one PR is for) and/or didn't have what they wanted. So some guy made their own, just like the 3 of us lol, and it got merged. But his highlights are very basic, there is just too much missing.

Right now I am trying to build a minimalistic sane default highlights.scm, hoping that it will get merged into Helix eventually.

SethGower commented 2 months ago

If you come up with one, especially one that "conforms to spec" for VHDL, share it here. I've had gripes with mine (and jpt's), but haven't had the time to fix them.

Chris44442 commented 2 months ago

Sure. But how do I know if it conforms to spec?

I also have a couple of issues:

That's it for now. Not exactly easy to make these kind of decisions that are supposed to make everyone happy.

Edit: Ok, I decided I will try to mimic the Verilog highlights.scm the best I can. If they did make the merge, so should we, right?

Chris44442 commented 2 months ago

@SethGower @jpt13653903

Ok, this is what I came up with just now. Maybe you guys have time to take a look?

Upside is this scm file works with the main branch

url = "https://github.com/alemuller/tree-sitter-vhdl.git"

in Neovim without any problems. Also works in Helix with much better performance than before. It is also more or less a mimic of the Verilog highlights.scm. Hopefully this will satisfy any "in spec" concerns.

Now the downsides:

My goal is to get this merged on nvim-treesitter and Helix. Any help would be nice :)

Btw why exactly was alemuller/tree-sitter-vhdl rejected from nvim-treesitter in the past?

SethGower commented 2 months ago

I'll take a look at this tomorrow at work! Thanks for doing it.

The "ambiguous_name" - "function_call" problem as mentioned above is a complete mess. My list of function.builtin is still incomplete, because it takes forever and a day to look them all up.

IIRC the last time I tried improving this, the function parameters were an issue I had as well (if that's what this is referring to)

Btw why exactly was alemuller/tree-sitter-vhdl rejected from nvim-treesitter in the past?

I think it was partially just because it is a bit too complex. I think they had issues with the parser itself being a bit slow/overly complex. However I forget exactly.

It probably also had to do with the issues discussed in this thread (relating to highlights)

jpt13653903 commented 2 months ago

@Chris44442 You've hit the nail on the head with your second point. The fundamental problem is the parser - the overly verbose highlights queries is a simptom.

The way to fix this properly would be to download the spec (I'll need to go hunt down the link for you) and implement it directly. The original author of this one made a complicated parser because:

  1. It was one of the first ones he ever did
  2. He wanted to have the parser output tokens that he can use to highlight errors (I infer this from comments earlier in this thread)

The VHDL spec is... intense. Good luck to anyone trying to implement it in full, especially the fact that it is case insensitive. You might need to write the parser in C code directly, and very carefully choose a subset to implement.

This said - I'll have a look at what you've done and see if I can contribute anything over the weekend...

Side note: I use the classes in the recommended list in the nvim-treesitter help. I find it odd that TokyoNight doesn't have "conditional". I'll have to double check (I'm using catppuccin myself), but I think the Neovim version of TokyoNight does have conditional.

jpt13653903 commented 2 months ago

I found the VHDL spec link: here

I see it's pay-walled, and quite expensive at $274.00 for IEEE members. I'm not willing to pay that just to get the TreeSitter highlighting fixed properly...

I also notice that @richjyoung is working on the grammar (not much yet, but maybe he has interesting ambitions), so I'm pulling him into the discussion...

richjyoung commented 2 months ago

The "ambiguous_name" - "function_call" problem as mentioned above is a complete mess. My list of function.builtin is still incomplete, because it takes forever and a day to look them all up.

The issue is the formal grammar is horribly ambiguous here, and this is the best starting point I have found to unravel the mess. Unfortunately my reasons for forking it and making some changes would likely go in the opposite direction to what you are looking to achieve here, I need minimal correct parser that isn't actually concerned with names at all.

I also maintain the Modern VHDL VSCode syntax highlighting extension, I suspect not but if there are any useful changes to be made there please let me know, as I believe the underlying language files are fairly portable for editors other than VSCode.

SethGower commented 2 months ago

I found the VHDL spec link: here

I see it's pay-walled, and quite expensive at $274.00 for IEEE members. I'm not willing to pay that just to get the TreeSitter highlighting fixed properly...

I also notice that @richjyoung is working on the grammar (not much yet, but maybe he has interesting ambitions), so I'm pulling him into the discussion...

Here's the non-paywalled version https://edg.uchicago.edu/~tang/VHDLref.pdf (this is a pretty old version, 2000, vs the 2019 version you linked, however it should work for these purposes)

jpt13653903 commented 2 months ago

Awesome - thank you. I think 2000 is plenty good enough. With a few small tweaks we should be able to do a 2008 version without much issue. I don't know what updated between 2008 and 2019 - not much FPGA compiler support for 2019 yet...

Chris44442 commented 2 months ago

Thanks again for the clarification. And the spec link too, very interesting. But it seems to me this is entirely the parsers job to be in spec, not the highlighting. Adjusting the parser would be way too time intense for me right now. And I think it is in a good enough state anyway.

Btw the yaml-file for the grammar saved me a lot of work right now. Very nice.

I am pretty much ready to open an issue and a PR. Problem is: Who will be maintainer for the actual parser? No one right?

Side note: I use the classes in the recommended list in the nvim-treesitter help.

@jpt13653903 but as it says there, it is "keyword.conditional", not "conditional" :)

Edit: I have opened an issue on nvim-treesitter and a PR for Helix. I have tried the highlights.scm with like 10 different .vhd files from all kinds of sources. Looks good so far.

jpt13653903 commented 2 months ago

At first glance I think a simple conversion from the Modern VHDL grammar YAML to a treesitter grammar should solve a large part of our problems.

I'll see if I can write a quick-and-nasty Python-based translator and then see where it leads...

It appears to be case sensitive though, but I'll see if I can autogen in the case insensitivity. Pity you can't simply add \c and call it a day - you have to do something like [eE][nN][tT][iI][tT][yY], unless I'm missing something?

jpt13653903 commented 2 months ago

Thanks again for the clarification. And the spec link too, very interesting. But it seems to me this is entirely the parsers job to be in spec, not the highlighting. Adjusting the parser would be way too time intense for me right now. And I think it is in a good enough state anyway.

Exactly. It is also the parser's job to match keywords to a single handle that the highlights queries can use, i.e. we shouldn't need to have long lists of names in the highlights file - that just slows it down.

I'll play around with it and see what I come up with. In the interim - thank you for fixing most of it, I'll definitely go and have a look.

I am pretty much ready to open an issue and a PR. Problem is: Who will be maintainer for the actual parser? No one right?

Unless you're volunteering...? ;-)

@jpt13653903 but as it says there, it is "keyword.conditional", not "conditional" :)

Good catch - I'll fix it. Shouldn't break anything else.

Edit: I have opened an issue on nvim-treesitter and a PR for Helix. I have tried the highlights.scm with like 10 different .vhd files from all kinds of sources. Looks good so far.

Awesome!

jpt13653903 commented 2 months ago

I see they don't want to merge your highlights unless someone takes over the parser...

I've volunteered.

Give me a few weeks to play with it and see if I can come up with a sensible parser...

richjyoung commented 2 months ago

Before I wrote the yaml rules for Modern VHDL I spent a long time trying to implement a spec-compliant parser in various different parser-generator engines. The problem is more fundamental, in the context of a single file there is insufficient information to resolve ambiguity in certain areas of the spec.

something(0) could be almost anything, function call, slice name, type conversion, indexed name, etc. It depends on the declaration of something, and the formal grammar denotes this by essentially tagging the name. Take the function_call rule:

function_call ::= function_name [ ( actual_parameter_part ) ]

In order to correctly parse a function call, you have to already know the name is a function. Throughout the spec there is no separate definition for function_name or any of these italicised variations, and this is not the only instance either.

Tree-sitter does have support for external scanners, I suspect it is possible to create a feedback loop into the scanner that allows it to recognise and tag known names, but that's only viable in a scenario where both the library (defined out-of-band anyway) and compile order are known; highlight rules in an editor do not want to have to parse the entire codebase, it would be too slow.

VHDL-LS gets around this by being a proper language server, I don't know how they resolve compile order but in my day-to-day usage it's been pretty solid.

I know it's not a solution but I wanted to explain my findings on the topic before anyone else spends a lot of time potentially ending up at the same conclusion!

jpt13653903 commented 2 months ago

@richjyoung

I found exactly the same thing. The language is extremely ambiguous.

My general strategy would be to highlight built-in / library functions as functions (i.e. treat them almost like keywords) and just live with the fact that all other stuff are simply "identifiers".

I like what you've done, and think it a brilliant starting point, if not a direct source that I can use to auto-gen from. I'll play with it and see where it leads.

By the way...

I think we should move the discussion to nvim-treesitter. Thoughts?

richjyoung commented 2 months ago

Ah cool, sounds like a good plan to me. Give me a shout if you find anything weird in the YAML, I know there's an issue with loops I've yet to track down!

jpt13653903 commented 1 month ago

Here's the non-paywalled version https://edg.uchicago.edu/~tang/VHDLref.pdf (this is a pretty old version, 2000, vs the 2019 version you linked, however it should work for these purposes)

I found a version of 2008 here, and I've convinced the company I work for to buy me the 2019 version :-)

So - soon we'll have a 2019 compliant syntax highlighter.

jpt13653903 commented 1 month ago

Ah cool, sounds like a good plan to me. Give me a shout if you find anything weird in the YAML, I know there's an issue with loops I've yet to track down!

Unfortunately I can't use it directly. The Tree-sitter auto-gen does not support the case-insensitive flag (i.e. (?i) or \c). Instead, I'm thinking of using an external scanner to handle the case insensitivity - the proof-of-concept scanner I wrote seems promising...