bakpakin / Fennel

Lua Lisp Language
https://fennel-lang.org
MIT License
2.46k stars 126 forks source link

tree-sitter-fennel and friends #237

Closed sogaiu closed 4 years ago

sogaiu commented 4 years ago

Hi,

As part of work on tree-sitter-ish things, I've put together an initial attempt at a tree-sitter grammar for fennel.

In order to get it into better shape, I've also created language-fennel for Atom and a demo for use with recent versions of Neovim's master branch.

Support for tree-sitter in Emacs is still in discussion / development / flux IIUC, so I haven't worked on anything there yet.

It looks like VSCode isn't going to support tree-sitter out-of-the-box, but once the grammar is in better shape, I might give that an attempt as I've already had some experience getting tree-sitter-clojure to work at some level.

If anyone is feeling adventurous or interested, there are some links below:

https://github.com/sogaiu/tree-sitter-fennel https://github.com/sogaiu/language-fennel https://github.com/sogaiu/a-tsfnl-nvim-fennel-plugin

jaawerth commented 4 years ago

Awesome work! This was on my to-do list since but I hadn't quite gotten to familiarizing myself with creating theme grammars yet. Are you planning on submitting this to linguist? I believe it uses the same cson grammar as used by Atom.

technomancy commented 4 years ago

Hadn't heard of this before but it looks cool! Is there anything needed from Fennel itself to support this or is this more of an FYI?

Definitely feel free to add it to the wiki section on tools: https://github.com/bakpakin/Fennel/wiki#tools

sogaiu commented 4 years ago

@jaawerth Regarding linguist, I looked into a bit.

It looks like they want there to be more than a certain number of repositories on github before considering an addition:

We try only to add languages once they have some usage on GitHub. In most cases we prefer that each new file extension be in use in hundreds of repositories before supporting them in Linguist.

via: https://github.com/github/linguist/blob/master/CONTRIBUTING.md#adding-a-language

I performed the following search and went through the results:

https://github.com/search?q=extension%3Afnl+filename%3A*fnl&type=Code

I counted a bit over 70 repositories (mostly leaving out dot-file-oriented ones) -- here is a list if interested: fennel.txt

Additionally, it looks like they use TextMate grammars, not tree-sitter ones:

Syntax highlighting in GitHub is performed using TextMate-compatible grammars. These are the same grammars used by TextMate, Sublime Text, and Atom. Every language in languages.yml is mapped to its corresponding TextMate scopeName.

via: https://github.com/github/linguist/blob/master/CONTRIBUTING.md#fixing-syntax-highlighting

IIUC, Atom used to exclusively use TextMate grammars, but has been transitioning to tree-sitter.

sogaiu commented 4 years ago

@technomancy Sorry for being unclear about the purpose of posting this issue.

It was mostly FYI, but as a relative new-comer to fennel, I am not so familiar with it and would not be surprised if parts of the grammar were off. Specifically there were a few points I wanted to ask about:

1) It looks like there was support for single quote delimited strings at one point, but this was removed. 2) It looks like @ was at one point used to unquote things and now , is used instead.

If you have any insight into whether these (or other things that changed) were used a fair bit, I'd appreciate knowing.

Also, is: https://github.com/bakpakin/Fennel/blob/master/changelog.md the best place to look for these types of things?

I wanted to get the grammar into a relatively good state before subjecting too many people to it :)

technomancy commented 4 years ago

Yeah, that's right; we had undocumented support for single-quote strings, but that's been removed, and comma is now the unquote character; early on it was whitespace.

The changes to documented features will always be found in the changelog, but sometimes weird undocumented things don't get a mention.

sogaiu commented 4 years ago

Thanks!

jaawerth commented 4 years ago

@sogaiu

Regarding linguist, I looked into a bit.

It looks like they want there to be more than a certain number of repositories on github before considering an addition:

We try only to add languages once they have some usage on GitHub. In most cases we prefer that each new file extension be in use in hundreds of repositories before supporting them in Linguist.

via: https://github.com/github/linguist/blob/master/CONTRIBUTING.md#adding-a-language

I performed the following search and went through the results:

https://github.com/search?q=extension%3Afnl+filename%3A*fnl&type=Code

I counted a bit over 70 repositories (mostly leaving out dot-file-oriented ones) -- here is a list if interested: fennel.txt

Additionally, it looks like they use TextMate grammars, not tree-sitter ones:

Syntax highlighting in GitHub is performed using TextMate-compatible grammars. These are the same grammars used by TextMate, Sublime Text, and Atom. Every language in languages.yml is mapped to its corresponding TextMate scopeName.

via: https://github.com/github/linguist/blob/master/CONTRIBUTING.md#fixing-syntax-highlighting

IIUC, Atom used to exclusively use TextMate grammars, but has been transitioning to tree-sitter.

No worries - if you have no objection, though, I'm inclined to mess around with the work you've already done to put together a linguist grammar repository and, after making sure it works well and is easy to update, see about submitting it to linguist anyway, since there's been some asking about adding github support, but we'll see..

I wouldn't want it rejected so I may poke around other supported niche langues to see what's been accepted in the past. It's just that some folks have asked about this and it would be a really nice feature to have ;-)

sogaiu commented 4 years ago

I think I misunderstood your response and composed a lengthy reply that seems possibly not-so-relevant in retrospect, but I've left it below in case it turns out to be useful at some point.

In any case, good luck with the linguist grammar attempt!


I have no objections, sounds great!

To add a bit of detail on the current state of the grammar, it's intentionally minimal in not having explicit support for things like fn. This was the second grammar I worked on (the first being one for Clojure), and I found that:

1) Since Lisps with macros can have their syntax extended, there is a question of when is one finished with the grammar. There is no way to dynamically extend the grammar in tree-sitter, though there is a way to create grammars that inherit from existing grammars.

2) I found it difficult if not impossible to discriminate between things like an actual function definition and data that happens to look like a function definition...at least, not perfectly. I think knowledge of context helps to make this disctinction, but that's not something I think tree-sitter has the ability to do. In practice, it might be that settling for something that works most of the time might an option and it may be that it won't be as much of a problem for Fennel. I try to exercise the grammars I've made against collected real-world code and this can help with getting a sense of how well things turn out "in practice".

3) The end use of the grammar can influence what to put in (and what to leave out). One of my earlier attempts at a Clojure grammar was very minimal (didn't handle metadata, discard forms, and other things fully) but it was fine for implementing rainbow parens. However, to handle highlighting appropriately for something like discard forms, it was insufficient. Eventually I figured out a way to handle a more complete set of things and now it's better suited to handle some highlighting things it couldn't easily before. I've also implemented an outline view for VSCode that tries to show function definitions -- this isn't perfect, but it manages without the grammar having to know anything about function definitions (just look for lists that have certain symbols in them with certain characteristics).

4) The more one adds to a tree-sitter grammar the harder it seems to be to keep it behaving. There were two earlier attempts at a tree-sitter grammar for Clojure and I've talked with one of the authors who had a similar opinion (he in turn had talked to the other author who felt similarly).

You didn't ask for this, but these are some of the points I think may be relevant for anyone working on a tree-sitter grammar for a lisp. May be you'll find them of some use. If any of these points to be mistaken, incomplete, etc. I'd be interested in hearing about what you find / think.

On a side note, since filing this issue, I've found someone else who has worked on a grammar for Carp and recently did another for Janet. He said the latter one is better and the former one may be revisited at some point:

https://github.com/GrayJack/tree-sitter-carp/ https://github.com/GrayJack/tree-sitter-janet/

At any rate, good luck, and please feel free to get in touch. I'm interested in how things go -- linguist support would be cool!

sogaiu commented 4 years ago

@jaawerth I don't know if you've made any progress on things, but FWIW thought you might be interested to see this: https://github.com/github/linguist/pull/4674

jaawerth commented 4 years ago

I hadn't gotten around to it yet, but good to know! Guess I'll hold off for a bit ;-)

That said, I'm not sure exactly how many unique repositories there are. I tried playing around with the search used in that linguist link but I couldn't see where it gave unique results; I'll have to play with that at some point. I was planning on doing that first anyway but now I'm doubly curious.

jaawerth commented 4 years ago

I just ran the Harvester script, by the same person who commented on that Janet PR, to tabulate a query of repos with .fnl in the code, and ran it through the helper scripts documented in the wiki, getting:

Unique repos: 141 Unique users: 108

There are a handful of false positives in there, due to a data format for the defunct Windows Live Mesh also being .fnl, but most of them look relevant. Still, not quite hundreds plural yet, but pretty good!

It's getting something of a boost from TIC, love2d, neovim, and spacehammer, which makes sense. Half the benefit is that it can run on the many things that run lua, after all.

sogaiu commented 4 years ago

When I made the repository list linked above, I went through a fair number by hand for verification purposes and left out false positives that I noticed. I'm pretty sure there are at least that number that are fennel. It's likely there are somewhat more as I think I left out some things that looked mostly like configuration.

It seems good to have an automated way so that the subsequent determinations may not feel like work, but also it might be more likely one would feel like checking on a regular (say monthly or every few months?) basis to get an idea of growth / change.

Figuring out break-down might be harder to automate I guess.

jaawerth commented 4 years ago

@sogaiu that's a great idea! It should be pretty easy to set up something in node that runs the harvester script via jsdom or puppeteer, I'll look into it. If I get motivated, it would be fun to generate a little visualization showing the changes over time, too :grin:

technomancy commented 4 years ago

It seems like there's nothing more to be done here in the Fennel repository, so I'm going to close this out; feel free to reopen if there is more to do.

sogaiu commented 4 years ago

@technomancy @jaawerth May be you folks know already, but FWIW, I noticed there seems to be another tree-sitter-fennel: https://github.com/travonted/tree-sitter-fennel

This looks like a more full-fledged undertaking.

On a side note, I didn't know you folks had this: https://fennel-lang.org/reference

Very nice!

jaawerth commented 4 years ago

@sogaiu Thanks for the heads up! I've been meaning to use this as a base to start messing with nvim-treesitter + fennel, but it looks like the nvim-treesitter folks are ahead of me :grin: