Open emlyn opened 5 years ago
Clojure & ClojureScript aren’t at the top of our list right now, but we’re working on making it easier for third parties to add support for new languages to semantic
. A good place to start would be by defining a tree-sitter
parser for Clojure/ClojureScript; see for example the docs on adding support for new languages and the grammar development guide.
Yep!
Since Clojure has macros, how will this work? It seems to me that for any language with macros that allow creation of new syntactic constructs (any Lisp dialect, Julia, Rust (?)) it would be far easier to use a client / server architecture (like I believe Kythe has) so that the AST generators that have already been written for those languages can be leveraged, rather than trying to write an incomplete version in Haskell.
Am I wrong here? I'm kind of curious because as a Dylan enthusiast I think it would be awesome to have this feature for our GitHub-hosted code. I decided to piggyback on this bug rather than ask separately for Dylan.
@cgay It is as of yet an open question as to how we’ll deal with languages that support truly arbitrary rewritings of their syntax (e.g. C with the preprocessor). Whatever we do, we’ll end up aiming for the happy path—we may not be able to understand every possible macro invocation, but we should at the very least be able to handle most real-world code, especially given that tree-sitter parsers continue parsing even in the presence of syntax errors. Note that many languages with macros don’t require special knowledge in their parser, e.g. tree-sitter-rust
.
Our roadmap going forward is not to write Haskell code to parse language grammars, but to generate per-language Haskell AST declarations from tree-sitter grammars. A client-server model a la Kythe is not desirable for architectural and legacy reasons: an approach with tree-sitter as the lingua franca is a lower maintenance burden than trying to corral N different language servers into a common vocabulary. See here and here for more discussion on why we’ve chosen a monolithic architecture.
For Common Lisp I believe it's literally impossible to do a perfect job finding cross references unless you are the Common Lisp compiler, since arbitrary code can be run to generate new code at macro-expansion time.
Dylan (and I believe Scheme) is a little more amenable since its macro system is just a pattern matcher, but I think for Dylan you'd need to basically re-implement the macro expander [edit: and the module system] in Haskell to figure out what definitions are being referenced in the generated code.
Can https://github.com/Engelberg/instaparse help?
For Common Lisp I believe it's literally impossible to do a perfect job finding cross references unless you are the Common Lisp compiler, since arbitrary code can be run to generate new code at macro-expansion time.
Yes, this is (an example of) why semantic
does program analysis.
Yes, this is (an example of) why
semantic
does program analysis.
?
[edit] To clarify, I don't see how that addresses the problem I'm describing.
We can do the same things as the compiler, including evaluating macros (subject to certain approximations).
And when the macro uses data that's only available at run-time in order to generate the code, or generates different code for different Common Lisp implementations? :-)
Anyway, hopefully your approach will work for more popular languages like Rust, Clojure, and Julia, which also have robust syntax extension mechanisms.
And when the macro uses data that's only available at run-time in order to generate the code, or generates different code for different Common Lisp implementations? :-)
That’s when the approximations become relevant: we’ll generate a set of results.
Note that meanwhile a static analyzer for Clojure appeared on the scene:
https://github.com/clj-kondo/clj-kondo/blob/master/analysis/README.md
This analyzer is used by tools like clojure-lsp to provide navigation and refactoring. This tool is available as a standalone command line utility as well (or can be compiled into a native library if that is useful from Haskell, but you might as well just shell out to it). It's also available as a JVM library.
Any possibility of adding Clojure/ClojureScript support?