Suggestions for scope additions/changes

savetheclocktower commented 5 years ago

Summary

I have suggestions for scope additions/changes for the tree-sitter grammar. If any of these are warmly received, I’ll whip up a PR.

Motivation

A bare-bones grammar is less useful. Scope names have value beyond syntax highlighting.

The suggestions

More `meta.` scopes

I’ve got a custom define-a-function command that behaves differently inside class bodies than it does elsewhere; with the Babel grammar I was able to use the meta.class.body scope to tell the difference.

I’d love for that (and a few other similar scopes) to be added. Without scopes it’s quite hard to get commands to do different things in different contexts.

Punctuation

All opening braces ({) — except in strings, of course — are scoped as punctuation.definition.function.body.begin.bracket.curly. This is inaccurate; lots of opening braces don’t signal the start of a function body. Closing braces are similarly scoped, as are square brackets and parentheses.

If the goal is to give all such punctuation the same scope regardless of context, I’d suggest these scopes be renamed like (e.g.) punctuation.definition.begin.brace so that the scope name doesn’t imply a context that may or may not be true. If these scopes should include information about their context so that {s can be distinguished from one another, then we need to be a bit more thorough in the grammar.

Definition vs. invocation

Both function names and class names are scoped identically between definition and usage.

class Foo extends Bar {}
let foo = new Foo();

function baz(...args) {}
baz(false);

In this code sample, both occurrences of Foo have the scope meta.class. Both occurrences of baz have the scope entity.name.function.

This is intended behavior, as I understand from @maxbrunsfeld’s comments in past issues like these. I’m not trying to change how these identifiers appear in default syntax themes. But I’d like these scopes to be distinguished somehow so that I can make them look different from one another in my own syntax theme if I so choose.

For functions, I’d suggest entity.name.function.definition in the definition case, and entity.name.function.call in the invocation case.

For classes, meta.class is a questionable choice in the first place, in my opinion; I’d suggest entity.name.class.definition and entity.name.class.instantiation. (If meta.class needs to be kept for backward-compatibility, that’s fine; I believe that it’s possible to apply two scope names to the same thing.)

Adding to the scope name shouldn’t affect their appearance in existing syntax themes.

Describe alternatives you've considered

There aren't any, short of continuing to use language-babel. Grammars are the underpinning of most of the customization I already do.

Ben3eeE commented 5 years ago

More meta. scopes

I am not sure about this. I believe you could use the tree-sitter syntax tree to get the same information and do what you want. That would mean you are not limited by the scopes language-javascript has defined and can do anything that tree-sitter recognizes. But I don't know what the APIs look like for this, if there are any yet.

Punctuation

Yeah this should change. I think this is an oversight from copy/pasting the TextMate grammar scope which differentiated them. I'd rather it was just one punctuation.definition.begin.brace because I don't see why they should be scoped differently.

Definition vs. invocation

I'd merge a PR that adds definition and call to the scopes to scope them differently. I believe it should be compatible with the current themes.

meta.class

This sounds like a bug. TextMate scopes it as entity.name.class, I guess the highlighting works because they both are class. I'd merge a PR that changes it. Same thing here, if you want to differentiate instantiation and definition that would be fine.

savetheclocktower commented 5 years ago

OK, here's what I'll put into the PR:

entity.name.function further clarified to entity.name.function.definition and entity.name.function.call
meta.class renamed to entity.name.class, then clarified to entity.name.class.definition and entity.name.class.instantiation
punctuation scopes renamed to conform to punctuation.definition.(begin|end).X for the common pairs; a quick scan of other grammars suggests that what I want for X is bracket.curly, bracket.round, and bracket.square for {, (, and [, respectively

I'd rather it was just one punctuation.definition.begin.brace because I don't see why they should be scoped differently.

My guess is that it’s a holdover from the TextMate 2 days. If you type {, its partner } is inserted, and your cursor sits between them. You could write a TM2 command or snippet that targeted that context (particular scopes on either side of the cursor); this is how TM2 auto-indented inside of functions, if-statements, and the like. And if the curly braces are scoped based on their specific function, then you could do different things based on the purpose of the curly braces — maybe you want one-true-brace-style for functions, but something different for object literals. But I agree with you; it’s a bit overweening. And even harder to justify in Atom, since a command has lots of other ways to look around the document.

One other question while I’ve got you here: is there rhyme or reason to when a scope name gets the .js at the end and when it doesn't? There appears to be no consistency in tree-sitter-javascript.cson. My guess is that it's viewed as superfluous these days if it doesn't actually distinguish how the scope should be highlighted, but I thought I'd make sure before I started a PR.

That'll be ready soon. Now that that's sorted, let me think out loud in a possibly tiresome way…

I am not sure about this. I believe you could use the tree-sitter syntax tree to get the same information and do what you want. That would mean you are not limited by the scopes language-javascript has defined and can do anything that tree-sitter recognizes. But I don't know what the APIs look like for this, if there are any yet.

Being able to query the tree-sitter syntax tree would be an acceptable compromise for me. Atom doesn't let me map a command to a very specific scope in my keymap.cson the way that TextMate could, so when migrating my customizations a couple years ago I had to write some utility code that inspected scope descriptors. Hell, I've got a command called interpret-num-pad-enter that dispatches that key to one of three different actual commands based on scope context. So it would be a mere lateral move if I had to write some similar code to inspect tree-sitter syntax trees.

Looks like Cursor.getSyntaxTreeScopeDescriptor will do what I want. (I would love an API that could reify the tree and my position therein so that I could peek around and act based on more than just a list of my ancestors — e.g., is my next sibling a function definition? — but I should be patient.)

Your answer moves me to ask broader architectural questions (e.g., are scopes going to be deprecated long term? relegated to simple syntax-highlighting purposes?) but I'm not sure where to pose those questions. If tree-sitter grammars are the exclusive future — that is, if TM-style grammars are going to be strongly deprecated or phased out altogether — then I'm fine with a future where I dispatch commands based on context within a tree-sitter syntax tree instead of a scope tree. If not, and the two types of grammars are bound to coexist for a while (as seems to be the case now), then it seems burdensome on users to make them understand the difference between the two grammar types instead of providing a unified interface. But I’m probably projecting a bit; “let me write highly contextual commands a bit more tersely” is not exactly a plea I can pretend to make on behalf of the average Atom user.

maxbrunsfeld commented 5 years ago

I would love an API that could reify the tree and my position therein so that I could peek around and act based on more than just a list of my ancestors

We haven't officially documented it - but you can call LanguageMode.getSyntaxNodeContainingRange. It's only available for languages with Tree-sitter parsers though:

const languageMode = buffer.getLanguageMode();
if (languageMode.getSyntaxNodeContainingRange) {
  const node = languageMode.getSyntaxNodeContainingRange();
  console.log("node is a", node.type);
  console.log("node position", node.startPosition, node.endPosition);
  console.log("other nodes:");
  console.log(node.firstChild, node.nextSibling, node.closest('function_definition'));
}

Ben3eeE commented 5 years ago

OK, here's what I'll put into the PR:

Great, thank you 🙇

is there rhyme or reason to when a scope name gets the .js at the end and when it doesn't?

Not really. I have added it when it was necessary for highlighting to work in some packages. Generally we have avoided it with tree-sitter because it should not be needed. There is already a scope applied to the entire file that is unique for every language.

Your answer moves me to ask broader architectural questions (e.g., are scopes going to be deprecated long term? relegated to simple syntax-highlighting purposes?)

As far as I know the TextMate grammars will be there but get no changes if there is a tree-sitter grammar for the language. We do accept well written PRs for TextMate grammars still. Regarding the other broader architectural questions I don't know. The best way forward would be to write an RFC https://github.com/atom/atom/blob/master/docs/rfcs/000-template.md

savetheclocktower commented 5 years ago

OK, so that promised PR is forthcoming, but I'm noticing a lot of other places that are begging for scopes. A couple examples:

import { default as one, two } from 'thing';
let foo = true;

With the current tree-sitter grammar, default, one, two, and foo have no scope whatsoever apart from source.js. So a syntax theme can't do anything with them.

If I had my druthers, all four of those would get something in the variable namespace. But, unlike in the earlier examples, adding those scopes will change the syntax highlighting in a built-in syntax theme like One Dark. So if I start out with this…

…and then add this rule to the grammar…

  'named_imports > import_specifier > identifier': 'variable.other.import.named.js'

…I'll end up with this:

And in doing that I have… induced a built-in syntax theme to make these twos different colors.

But I'm not trying to do that. I'm just trying to add a hook.

I think all these examples should have scope names. If a built-in theme doesn't want to apply highlighting in these specific examples, I think it ought to opt out of highlighting them in its JavaScript-specific override file. Otherwise you're coupling the grammar to the design choices of the default syntax theme… at the expense of all the others.

Honestly, if the price I had to pay in order to add these variable scopes was to make a parallel change to all six of the built-in syntax themes in its own PR — one which guaranteed no net changes in highlighting behavior — I'd do it.

The next best alternative, in my opinion, would be to scope them someplace weird, like meta.variable.whatever — except that I can't have .variable as one of the scope parts, because that'll produce the same outcome I was trying to avoid in the first place. So the naming would have to be even more ornery, which defeats the purpose of semantic scope naming.

My head hurts. Which of the options I've described is least offensive to you folks?

Ben3eeE commented 5 years ago

My head hurts. Which of the options I've described is least offensive to you folks?

I've faced the same problems.

When I wanted to scope this I opened a PR to change the color to something different than any other variable https://github.com/atom/atom/pull/18383. Because it's not just any variable so it should have a special color.

For other places I used a scope that aligned with the tree-sitter naming: https://github.com/atom/language-javascript/blob/d66821651bd97279e03c4285269aea2f283a0017/grammars/tree-sitter-javascript.cson#L72

This scope is not styled by any of the themes and is still a correct naming. What do you think about making it import-specifier.identifier? It matches the naming of function parameters and if you want all of these things the same color you could just add the same styling to identifier.

Although since then we have added scopes to language-python for parameters > identifier that is variable.parameter.function https://github.com/atom/language-python/pull/297/files so it's a bit of a mess if you start to dig deeper. This change in language-python doesn't match what @maxbrunsfeld has mentioned earlier about styling.

I am not sure what would be best. There are some comments in the linked PRs that you can refer to. But the best way forward to get a decision on this that applies to every language would be to write an RFC to discuss this. Do you want to write one?

savetheclocktower commented 5 years ago

I mean, does anyone ever want to write an RFC? But, yeah, I might. In the meantime, I'll add scopes like import-specifier.identifier that couldn't possibly conflict with anything, and then that'll at least be something to work with.

savetheclocktower commented 5 years ago

RFC written. I hope it doesn't come off like a screed!

Ben3eeE commented 5 years ago

It's a very well written RFC and it has good points that I considered myself. Initial review (I will post a more detailed review on the PR tomorrow hopefully) is that it should be changed from I want this because I do this-language to This is good because that means we can do this. Don't write it for yourself. Write it for the value it adds for everyone.

savetheclocktower commented 5 years ago

@Ben3eeE, that was good feedback, and I'm trying to rewrite the RFC a bit now that I've had a few weeks to think about it. My main struggle is that I don’t know how much to bite off.

In an ideal world, I’d be writing an RFC proposing that Atom commit itself to TM-style scope-naming conventions, even in tree-sitter grammars, and to evolve those conventions where necessary. What keeps me from writing that RFC is that (a) it feels like a lot to ask, especially from an occasional contributor like me; and (b) it could easily get bikeshedded into eternity without strong consensus. That’s why my first attempt aimed lower — but I think I aimed too low and made my proposal sound like a minor detail, rather than an important design decision.

So I’ll just ask you now: should I be bolder in my second draft? You seem to be more familiar with the temperature of the room and the people I’d have to convince.

atom / language-javascript