clojure-emacs / clojure-ts-mode

The next generation Clojure major mode for Emacs, powered by TreeSitter
GNU General Public License v3.0
129 stars 11 forks source link

Organize repo directory structure with queries folder (?) #10

Open jasonjckn opened 1 year ago

jasonjckn commented 1 year ago

I noticed a lot of the tree sitter repos have a dedicated queries folder,

e.g. https://github.com/helix-editor/helix/tree/master/runtime/queries/julia

Just a thought... makes it a bit more modular, and also is a possible avenue for extensions maybe, if you load them from a variety of locations found on the path.

sogaiu commented 1 year ago

I think being able to work with more than one query is a good idea, whether it's extending or overriding. I experimented with something similar in elisp-tree-sitter with some success and felt the benefit first-hand. I also know some folks who do this sort of thing with nvim-treesitter.

However, is there a move on the Emacs 29 side to do this kind of thing with their bundled *-ts-mode.el files? The last time I looked over some of the files I didn't notice evidence along those lines.

I think as it's still "early stage" for *-ts-mode.el files, it might be preferrable to not stray too far from what the Emacs 29 folks are doing. I suspect they'll end up wanting to do something similar eventually and may be waiting to see what they do isn't such a bad idea?


On the subject of externalizing a mode's query...

ATM, I think most / all of the queries in *-ts-mode.el files are "generated" via a function. For example in ruby-ts-mode.el, there is this:

(defun ruby-ts--font-lock-settings (language)
  "Tree-sitter font-lock settings for Ruby."
  (treesit-font-lock-rules
   :language language
   :feature 'comment
   '((comment) @ruby-ts--comment-font-lock)

   :language language
   :feature 'builtin-variable
   `(((global_variable) @var (:match ,ruby-ts--predefined-variables @var)) @font-lock-builtin-face)

   ;; a lot of stuff elided

Note the ,ruby-ts--predefined-variables which is a reference to an Emacs Lisp construct. So this isn't really a "static" thing like in Helix's or nvim-treesitter's queries that can just be pulled out into a .scm file without additional changes IIUC.

I think to "externalize", some other arrangement would be neccessary -- e.g. the file could be a .el file and if ruby-ts--predefined-variables is used by more than one file, presumably that would need to be accounted for.

I think the reference to Emacs Lisp constructs in a *-ts-mode.el's "query" is pretty typical. For example, in clojure-ts-mode.el, there is this: https://github.com/clojure-emacs/clojure-ts-mode/blob/359521e61ffb3c3b01bf9a19bccbf0ccd52c5968/clojure-ts-mode.el#L248

Note the clojure-ts--builtin-symbol-regexp there -- similar to the kind of thing in ruby-ts-mode.el.

dannyfreeman commented 1 year ago

I'd prefer to keep everything in the same clojure-ts-mode.el file right now, as that is the convention in the Emacs community for these tree-sitter major modes.

However, I think the underlying question about how users can extend the mode is a good thing to begin thinking about.

As Sogaiu points out, there are some regular expressions that are pulled into the call to treesit-font-lock-rules. Those could be modified to so that the vars are just plain lists that the user could add to.

For example, instead of

(defconst clojure-ts--builtin-symbol-regexp
  (eval-and-compile
    (concat "^"
            (regexp-opt
             '("do"
               "if"
               "let*"
               "var"
               ; ...
               )))))

;;;...
(defvar clojure-ts--font-lock-settings
  (treesit-font-lock-rules
   ;; ...

   :feature 'builtin
   :language 'clojure
   `(((list_lit :anchor (sym_lit (sym_name) @font-lock-keyword-face))
      (:match ,clojure-ts--builtin-symbol-regexp @font-lock-keyword-face))

   ;; ...
   ))

We could instead have something like this

(defvar clojure-ts--builtin-symbols
  '("do"
    "if"
    "let*"
    "var"
    ; ...
    )

;;;...

(defvar clojure-ts--font-lock-settings
  (treesit-font-lock-rules
   ;; ...

   :feature 'builtin
   :language 'clojure
   `(((list_lit :anchor (sym_lit (sym_name) @font-lock-keyword-face))
      (:match ,(concat "^" (regex-opt clojure-ts--builtin-symbols) "$") @font-lock-keyword-face))

   ;; ...
   ))

Which would allow users to extend the list of builtin symbols with a call to (add-to-list ...) in their configuration. Of course, I would prefer any real builtin symbols missing be fixed in a PR, but I'm sure there are other use cases here.

That change would allow for some extension of raw matching. There are other ways things could be extended as well. New syntax highlighting rules could also be added if we were to define treesit-font-lock-rules with data instead of as a macro/function call or whetever it is. That would allow all the font-lock-rules to be set in a var that users could add to or remove from.

Another customization can be done through customizing the treesit-font-lock-level, which corresponds to the settings in treesit-font-lock-feature-list The new mastering emacs article about tree-sitter explains this better than I can. I need to revisit how other tree-sitter modes can have configured this variable.

Finally, customization methods should be documented. This is a good thread so I would like to leave it open. The more easily extensible this package is the better. I will work on some of this in the near future.