github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.15k stars 4.21k forks source link

Using tree-sitter for syntax highlighting for R #7029

Closed DavisVaughan closed 3 weeks ago

DavisVaughan commented 3 weeks ago

CC @look who's helped me on the Code Search part of this

As of a few months ago, GitHub uses tree-sitter-r for Code Search for R https://github.com/orgs/community/discussions/120397

Pretty much everything works great except for clicking on a function reference to cause the Symbols pane to pop up. Apparently this has to do with a CSS issue related to syntax highlighting. See the videos in the first link, and some discussion of the issue in the second: https://github.com/orgs/community/discussions/120397#discussioncomment-9991367 https://github.com/orgs/community/discussions/120397#discussioncomment-10014693

I noticed that you use r.tmbundle for R: https://github.com/github-linguist/linguist/blob/f0aebbe90d3b9b9bae5b7258f99370def1ee489f/grammars.yml#L937-L939 https://github.com/textmate/r.tmbundle

I might be crazy, but I looked a little at r.tmbundle and it seems like it does not recognize function calls? At least, I don't see anything here that mentions function references. So maybe that is related to my issue?

That might be why function references show as "plain text" for R, while for Python they are nicely highlighted in purple:

Screenshot 2024-09-04 at 3 45 22 PM Screenshot 2024-09-04 at 3 46 05 PM

Anyways, I was wondering if we could have "improved" syntax highlighting for R if we transitioned to using tree-sitter-r for syntax highlighting in Linguist too, i.e. adding tree-sitter-r to this list: https://github.com/github-linguist/linguist/blob/f0aebbe90d3b9b9bae5b7258f99370def1ee489f/script/list-grammars#L8-L30

We have a highlights.scm file ready to go: https://github.com/r-lib/tree-sitter-r/blob/main/queries/highlights.scm

So I guess my questions are:

lildude commented 3 weeks ago

I have just found https://github.com/github-linguist/linguist/discussions/6073, I guess it isn't Linguist doing anything, but an internal GitHub syntax highlighting team's code.

Bingo!!

Linguist has no control over the use of tree-sitter grammars. These are implemented directly in the syntax highlighter which doesn't currently use the tree-sitter grammar for R so relies on the TextMate grammar shipped with Linguist.

It's certainly possible all these issues would be resolved by switching grammars but unfortunately there's nothing we can do from Linguist short of accepting a PR to switch to a better TextMate grammar.

The best I can suggest is using the discussion to get the syntax highlighting team involved or use the Contact link at the bottom of any page on GitHub to engage support which can then liaise with the right team.

DavisVaughan commented 3 weeks ago

Alright thanks! I've opened a support ticket (Ticket 2983784) for now, but yea I'd also be happy to talk with someone on the syntax highlighting team directly if anyone can put me in contact with them. Since this isn't a Linguist issue though, I'll close.