UserNobody14 / tree-sitter-dart

Attempt to make a tree-sitter grammar for dart
MIT License
62 stars 40 forks source link

Where is this project at? #1

Closed TimWhiting closed 3 years ago

TimWhiting commented 4 years ago

Hi, I'm interested in a tree-sitter dart parser. Just wondering the status of this? Does it support the full Dart grammar? Have you tested it much? If you need some help, I'd be willing to fork this repository and help work on it.

UserNobody14 commented 4 years ago

Sorry I didn't see this, I've been quite busy at work and at home recently. To answer your questions: Yes, it supports the full dart grammer as detailed in the dart lang specification 5th edition draft, every grammatical feature in that book is supported, as far as I'm aware. I also added a few additions detailed in the informal draft spec (I'm haven't gone through the draft spec in detail, I just wanted to implement the list-comprehension features, and a few others I think).

I've tested it fairly well, and there are no errors when running the tests i created, obviously. Probably some slightly more thorough testing might be in order, particularly on existing codebases, but as far as all the grammatical elements in the testing folder, it does just fine.

I'd be happy for any help I can get, I mainly started working on this as a side project to get a better understanding of dart, and to help with another side-project of mine. If I can be of any help to you please feel free to ask. I'm a little new to open-source contribution but I am eager to help if I can!

The status is that it was in pretty good shape as far as I know. I hope to get back to it at some point but I'm not aware of any major bugs or missing features, so I think the main 'issue' is just expanding the scope of the tests and fixing things here and there.

TimWhiting commented 4 years ago

Cool, I'd love to help out. I'll take a look at the tests and see if I can add any more tests for edge cases, or find anything that is missing.

Once I get familiar with it, I'd love to help make this compatible with the latest features of dart, such as collection-for and collection-if, and the upcoming changes for non-nullable types.

What is the best way to start understanding the grammar? It looks like there is a lot commented out in there. Was it copied and altered from some other language? I'm familiar with parsers like this, but just wanted to understand the approach you were taking.

TimWhiting commented 4 years ago

As some background, I'm mostly interested in this because there is software for coding by voice (https://serenade.ai/) that I want to use, but they need a tree-sitter parser implementation to enable using Dart in their system. From what they are saying, it would be easiest to add in their software if the parser was similar to the Javascript implementation. Since Dart and Javascript are pretty similar, I wonder if we can make the implementations match up a bit, I haven't looked too closely though. Of course I want to stay as close to the dart lang specification as possible as well to make it easier to maintain and add new Dart features as it evolves.

UserNobody14 commented 4 years ago

Hmm. I actually tried grabbing a fair amount of stuff from the Java parser, and a few things from the typescript parser but I'm not sure exactly how to peel it closer to javascript. I think it might be possible to rename some things to be closer to their javascript equivalents, but whenever things get rewritten (i.e. changes to the organization/semantics) it tends to cause a lot of headaches. I'll try to look at serenade.ai when I get the chance, that looks neat. In the meantime I merged your pull.

UserNobody14 commented 4 years ago

Cool, I'd love to help out. I'll take a look at the tests and see if I can add any more tests for edge cases, or find anything that is missing.

Once I get familiar with it, I'd love to help make this compatible with the latest features of dart, such as collection-for and collection-if, and the upcoming changes for non-nullable types.

What is the best way to start understanding the grammar? It looks like there is a lot commented out in there. Was it copied and altered from some other language? I'm familiar with parsers like this, but just wanted to understand the approach you were taking.

Yeah, I essentially started with the Java dart grammar and then hacked dart in. As far as I know, collection-for and collection-if are already implemented, as for_element and if_element. There's even a test for them in dart.txt image

The first two places I used in building the grammar were the dart specification and the tree sitter guide http://tree-sitter.github.io/tree-sitter/creating-parsers I follow the dart_grammar pretty closely, as you can see I even use most of the same names and structures as the original (using the above collection-for as an example): image image

Unfortunately due to working on it in kind of an intermittent pattern produces harder to understand results. Cleaning up the comments and organizing the grammar elements into recognizable groupings (class and method elements, expression elements, literals, statements) and then adding a big header to each group, like:

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// /////////EXPRESSION ELEMENTS////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// ////////////These are the Expression elements from section 16 (Page 75) of the dart specification////////////////// /////////////////////////////////(as a side note, that actually is where the expression elements are in the draft spec)////////// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Or something like that, would improve readability at least a little bit. I'll see if I can get to that at some point tomorrow.

TimWhiting commented 4 years ago

@UserNobody14 I've got it so that on the output of tree-sitter parse for the entire flutter codebase I only see 0.48% of lines in the tree-sitter output that have "ERROR" in it. I've added some tests for a few of the things that are failing. For those issues it seems to be a matter of precedence, because the grammar contains the relevant pieces. I've also added some things related to null-safety, but I think there are probably things that I'm missing which probably account for most of the rest of the errors in the flutter codebase. When you have a chance, could you take a look at the failing test cases? I have a basic understanding of precedence, and can fix simple things, but these issues seem to elude my attempts to fix them. In the meantime, I started cleaning up some of the comments and removing legacy code. Also I've been adding big header comments to delineate the sections of code.

UserNobody14 commented 4 years ago

I fixed all but one of the issues. The only items remaining are the 2 type-casting issues (caused by the same problem I believe), which has deeply frustrated my attempts to fix it. I'd go into more detail, but I have a headache from fixing stuff all day. Suffice to say, I think the grammar is either incorrect or wasn't implemented precisely in the actual dart compiler. Either way we may have to deviate from the grammar a bit on this one. It works fine if you parenthesize the type cast, but if you leave out the parens, it completely screws up. I'll try to fix this when I get the chance.

TimWhiting commented 4 years ago

Cool! I ran it against the flutter codebase again, and after a few iterations and fixes I did today, we now have only 0.00615% of lines in the tree-sitter parse output that have errors in them. (i.e. 421 lines of code, and most of them look like they are the typecasting issue).

shivahnshankar commented 4 years ago

Hi Would it be possible to include this within Neovim's treesitter functionality? I have opened https://github.com/nvim-treesitter/nvim-treesitter/issues/213 for this fyi. Would hopefully help generate more end user feedback.

TimWhiting commented 4 years ago

@shivahnshankar Looks like there is quite a bit of progress on the linked issue. Glad it was able to be integrated so quickly. There still needs to be some work on the highlighting side. (We have focused more on correctly parsing the syntax up till now). Feel free to open a pull request if you want to figure out the highlighting. By figuring out highlighting we might also find / fix some other issues with how it is organized. Depending on my schedule I won't be able to work on highlighting much until later.

UserNobody14 commented 4 years ago

I fixed the type casting issue! It's getting late, so I didn't test it on the flutter codebase. Might in the morning.

TimWhiting commented 4 years ago

Down to 279 error lines: 0.0040855457%. Thanks! I'll have to take a look at what you did. I'm guessing the get / set used as method names, accounts for quite a few errors of whats left.

TimWhiting commented 4 years ago

Just found a good reference for the latest dart grammar (an Antler spec that seems to be updated). https://github.com/dart-lang/sdk/blob/master/tools/spec_parser/Dart.g