github / semantic

Parsing, analyzing, and comparing source code across many languages
8.97k stars 453 forks source link

Clojure(Script) support #217

Open emlyn opened 5 years ago

emlyn commented 5 years ago

Any possibility of adding Clojure/ClojureScript support?

robrix commented 5 years ago

Clojure & ClojureScript aren’t at the top of our list right now, but we’re working on making it easier for third parties to add support for new languages to semantic. A good place to start would be by defining a tree-sitter parser for Clojure/ClojureScript; see for example the docs on adding support for new languages and the grammar development guide.

emlyn commented 5 years ago

Thanks for the response. I've had a look around and found a couple of implementations here and here. I suppose the next step would be to add one of these to haskell-tree-sitter?

robrix commented 5 years ago

Yep!

cgay commented 5 years ago

Since Clojure has macros, how will this work? It seems to me that for any language with macros that allow creation of new syntactic constructs (any Lisp dialect, Julia, Rust (?)) it would be far easier to use a client / server architecture (like I believe Kythe has) so that the AST generators that have already been written for those languages can be leveraged, rather than trying to write an incomplete version in Haskell.

Am I wrong here? I'm kind of curious because as a Dylan enthusiast I think it would be awesome to have this feature for our GitHub-hosted code. I decided to piggyback on this bug rather than ask separately for Dylan.

patrickt commented 5 years ago

@cgay It is as of yet an open question as to how we’ll deal with languages that support truly arbitrary rewritings of their syntax (e.g. C with the preprocessor). Whatever we do, we’ll end up aiming for the happy path—we may not be able to understand every possible macro invocation, but we should at the very least be able to handle most real-world code, especially given that tree-sitter parsers continue parsing even in the presence of syntax errors. Note that many languages with macros don’t require special knowledge in their parser, e.g. tree-sitter-rust.

Our roadmap going forward is not to write Haskell code to parse language grammars, but to generate per-language Haskell AST declarations from tree-sitter grammars. A client-server model a la Kythe is not desirable for architectural and legacy reasons: an approach with tree-sitter as the lingua franca is a lower maintenance burden than trying to corral N different language servers into a common vocabulary. See here and here for more discussion on why we’ve chosen a monolithic architecture.

cgay commented 5 years ago

For Common Lisp I believe it's literally impossible to do a perfect job finding cross references unless you are the Common Lisp compiler, since arbitrary code can be run to generate new code at macro-expansion time.

Dylan (and I believe Scheme) is a little more amenable since its macro system is just a pattern matcher, but I think for Dylan you'd need to basically re-implement the macro expander [edit: and the module system] in Haskell to figure out what definitions are being referenced in the generated code.

dijonkitchen commented 4 years ago

Can https://github.com/Engelberg/instaparse help?

robrix commented 4 years ago

For Common Lisp I believe it's literally impossible to do a perfect job finding cross references unless you are the Common Lisp compiler, since arbitrary code can be run to generate new code at macro-expansion time.

Yes, this is (an example of) why semantic does program analysis.

cgay commented 4 years ago

Yes, this is (an example of) why semantic does program analysis.

?

[edit] To clarify, I don't see how that addresses the problem I'm describing.

robrix commented 4 years ago

We can do the same things as the compiler, including evaluating macros (subject to certain approximations).

cgay commented 4 years ago

And when the macro uses data that's only available at run-time in order to generate the code, or generates different code for different Common Lisp implementations? :-)

Anyway, hopefully your approach will work for more popular languages like Rust, Clojure, and Julia, which also have robust syntax extension mechanisms.

robrix commented 4 years ago

And when the macro uses data that's only available at run-time in order to generate the code, or generates different code for different Common Lisp implementations? :-)

That’s when the approximations become relevant: we’ll generate a set of results.

borkdude commented 3 years ago

Note that meanwhile a static analyzer for Clojure appeared on the scene:

https://github.com/clj-kondo/clj-kondo/blob/master/analysis/README.md

This analyzer is used by tools like clojure-lsp to provide navigation and refactoring. This tool is available as a standalone command line utility as well (or can be compiled into a native library if that is useful from Haskell, but you might as well just shell out to it). It's also available as a JVM library.