ProjectUnifree / unifree

MIT License
1.43k stars 75 forks source link

[Proposal] Look at using Tree Sitter to parse C# #23

Open wiltaylor opened 11 months ago

wiltaylor commented 11 months ago

Another possability is instead of using AI to parse code you could look at something like Tree sitter to parse the C# code into an abstract syntax tree and then use that to translate into other languages/engines.

This might be a more maintainable approach moving forward.

Tree sitter has lots of syntax definitions for nearly every language and is used in lots of popular projects like neovim.

Some links:

bshikin commented 11 months ago

Tree-sitter is already added as a base for C# parsing and separating class definition vs. methods.

How would you perform the actual translation with tree-sitter?

orbikm commented 11 months ago

Typically this is how this type of tooling is built. Using AI and LLM may be passable with a lot of SME in that area, but my impression would be that it would be easy to get to 90% correct, and then very difficult to get the remaining 10% done.

Using a C# parser tool like tree sitter (or an alternative), to generate an AST is one step of the process. The next step is to use the AST to generate some kind of a model, which contains similar info the the AST, but can embed useful context into the nodes as well. Finally, you perform a projection step, where the model to generate output code in the target language. This is typically done using some kind of templating language, e.g. jinja (if python)

I have written / maintained numerous systems like this for exposing API projections in different languages for over a decade now, and I think this does represent a scalable and effective solution as opposed to leveraging LLM.

I would suggest that if this is an avenue we want to go down, to look into some of the core Reflection libraries included in DotNet as a means to building the AST / model. It may not be necessary to pull in a dependency for that.