github / semantic

Parsing, analyzing, and comparing source code across many languages
8.97k stars 453 forks source link

R support #382

Open jimhester opened 4 years ago

jimhester commented 4 years ago

R is a widely used, growing language often used in Data Science and Statistics.

While it does not have a published formal specification, there is a draft specification that describes lexing and parsing the language.

In the most widely used implementation the parsing is done with a bison parser defined in gram.y.

The lexing rules for R are somewhat complex, but the parsing is relatively straightforward, as generally everything is an expression.

It would very beneficial to the R community to have support for R in semantic!

XVilka commented 4 years ago

A first step would be creating a tree sitter grammar for R. See more in their documentation on creating new parsers. After that, the hardest part begins - implementing the semantic part: https://github.com/github/semantic/blob/master/docs/adding-new-languages.md

tclem commented 4 years ago

Definitely start with a tree-sitter parser.

After that, the hardest part begins - implementing the semantic part

This is about to get much, much easier as we are almost entirely automating the semantic part. My advise would be to get the tree-sitter grammar in good shape and then check in again with the team so see if we can use the new path for supporting languages in semantic.

2yz commented 4 years ago

Definitely start with a tree-sitter parser.

After that, the hardest part begins - implementing the semantic part

This is about to get much, much easier as we are almost entirely automating the semantic part. My advise would be to get the tree-sitter grammar in good shape and then check in again with the team so see if we can use the new path for supporting languages in semantic.

Is the new path for supporting languages available soon?

tclem commented 4 years ago

Is the new path for supporting languages available soon?

I don't have a specific timeline to give you, but you can see an example for java and python of what's required to generate code from the new node-types.json. Obviously part of the work here is to better surface our documentation.

XVilka commented 4 years ago

Link to the documentation has moved to https://github.com/github/semantic/blob/master/docs/codegen.md

jimhester commented 4 years ago

Just a small update, I have begun work on a tree sitter parser for R (https://github.com/jimhester/tree-sitter-r)

Only spent a few days on it, but it is already fairly functional, so I could start looking into semnatic support in the near future.

jimhester commented 3 years ago

The tree sitter parser is now in pretty good shape. I have moved it to https://github.com/r-lib/tree-sitter-r and sent a PR to https://github.com/tree-sitter/haskell-tree-sitter/pull/295. Once that is merged I guess it needs to be pushed to hackage so it can be used in semantic.

Now that https://github.com/github/semantic/pull/577 has been merged I am a little unclear what the next steps within semantic neend to be. If someone could clarify that for me I would be happy to work on it!