BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
578 stars 163 forks source link

Preliminary implementation of a tree-sitter backend #471

Closed chaserhkj closed 5 months ago

chaserhkj commented 7 months ago

Generates grammar.js file that could be used with tree-sitter to generate parsers with many different languages. Most of the functionalities of BNFC should be already implemented and working.

There are two remaining issues/features that are not implemented yet:

  1. tree-sitter does not distinguish between a lexer and a parser, this created issues in their keyword recognition (documented here). They use a keyword extraction mechanism to list token rules that could accept a keyword explicitly. The only way to do this cleanly from the BNFC side is to run all user-defined token patterns against all keywords in the input grammar to determine which patterns should be listed to do keyword extraction. Unfortunately, there are no regex facilities in BNFC to do this currently and an external regex engine is needed. This branch currently just puts all user-defined token patterns into the keyword extraction list for safety. The performance impact of such practice is unknown. How should BNFC approach this problem should be discussed further.
  2. Like many other lexers, the regular expression used by tree-sitter does not support general set differences. However, tree-sitter supports a mechanism known as external scanners that could be used to implement this. This is not implemented yet.

These are documented in the source code using TODO comments as well.

Should help #193 and #394

chaserhkj commented 7 months ago

Rev: Removed usage of BlockArguments to be compatible with older GHC

wenkokke commented 7 months ago

I think the primary thing that appears to be missing is support for layouting, for which we'd have to generate custom scanners.

chaserhkj commented 7 months ago

Yes, layouting features would require a custom scanner as well. I think that would be an easier starting point for introducing custom scanners for this backend. I would look into it in my spare time.

That being said, the current state of the branch already satisfies my current needs thus it would probably be a lower priority item on my list.

chaserhkj commented 7 months ago

I just added the parameters for testing tree-sitter backend to the testing parameter matrix.

Currently, the backend is only generating grammar.js and no test code is generated together with the grammar. So in testing I just called tree-sitter parse to use the grammar file to parse input directly.

Unfortunately, there are a number of test failures. I will use next week to try to fix these failures. Before that, I will mark this PR as draft.

chaserhkj commented 5 months ago

I am closing this PR since the main commits of it has already been merged (Ref: f1c510157dda55c0f51ca60d04102d3e4688be2a)

A few issues are emerging and related larger changes are made to the tree-sitter backend when I am working on the tests. I will include them in a separate PR.

andreasabel commented 1 month ago

I am closing this PR since the main commits of it has already been merged (Ref: f1c5101)

I have not recollection of merging the branch but maybe I did it by accident when I checkout out your PR. :$