pokey commented 8 months ago

The problem

As a contributor, it is hard to know exactly what needs to be implemented for a new language, or what is missing from an existing language. As a user, it is hard to know what our support level is for an existing language.

I really like the support tables in https://github.com/cursorless-dev/cursorless/pull/1911 and https://github.com/cursorless-dev/cursorless/pull/1962. We should formalize these somehow with the following goals:

make it easier for people to contribute languages
make it easier for contributors to see what's missing for a given language
make it harder to forget to write tests
make it easier for users to know what a given language supports

The solution

There are multiple levels of this solution, in increasing levels of investment

1. Add support tables to our new language PR template

This is the lowest overhead, and has the benefit that it's easy for the contributor and core maintainers to update as a PR progresses.

The drawback is that it's not helpful for users, and doesn't help as much after the initial PR. It's also not formalized in any way

2. Manually add support tables to per-language pages (https://github.com/cursorless-dev/cursorless/issues/1642)

This solution is probably the highest overhead for new language contributors, but doesn't require much investment on our part, and is helpful both for users and for contributors seeing where a language is currently

It also isn't automatically enforced in any way, so could get stale / be wrong just like 1) above

We'd also want to add a checkbox to the PR template saying we've updated this page. Probably just the new-language template but could be the general template

3. Add structured representation of language scope support

We could add types that represent different aspects of scope support. Something like the following

const scopeSupportFacets = [
  "list",
  "list.interior",
  "map",
  "map.interior",
  "collectionKey",
  "namedFunction",
  "namedFunction.interior",
  "functionName",
  "anonymousFunction",
  "anonymousFunction.interior",
  "name",
  "value.assignment",
  "value.assignment.removal",
  "value.return",
  "value.return.removal",
  "value.collectionItem",
  "value.collectionItem.removal",
  "statement",
  "ifStatement",
  "condition.if",
  "condition.while",
  "condition.doWhile",
  "condition.for",
  "condition.ternary",
  "branch",
  "comment.line",
  "comment.block",
  "string.singleLine",
  "string.multiLine",
  "textFragment",
  "functionCall",
  "functionCallee",
  "argumentOrParameter.argument",
  "argumentOrParameter.argument.removal",
  "argumentOrParameter.parameter",
  "argumentOrParameter.parameter.removal",
  "class",
  "class.interior",
  "className",
  "type",
] as const;

type ScopeSupportFacet = (typeof scopeSupportFacets)[number];

interface ScopeSupportFacetInfo {
  label: string;
  description: string;
  scopeType: SimpleScopeTypeType;
}

const scopeSupportFacetInfos: Record<ScopeSupportFacet, ScopeSupportFacetInfo> =
  {
    list: {
      label: "List",
      description: "A list of items",
      scopeType: "list",
    },
    "list.interior": {
      label: "List interior",
      description: "Excludes the opening and closing delimiters of the list",
      scopeType: "list",
    },
    map: {
      label: "Map",
      description: "A map of key-value pairs",
      scopeType: "map",
    },
    "map.interior": {
      label: "Map interior",
      description: "Excludes the opening and closing delimiters of the map",
      scopeType: "map",
    },
  };

enum ScopeSupportFacetLevel {
  supported,
  unsupported,
  notApplicable,
}

type LanguageScopeSupportFacetMap = Record<
  ScopeSupportFacet,
  ScopeSupportFacetLevel
>;

const { supported, unsupported, notApplicable } = ScopeSupportFacetLevel;

// Example support table for a language
const typescriptSupport: LanguageScopeSupportFacetMap = {
  list: supported,
  "list.interior": supported,
  map: supported,
  "map.interior": supported,
  collectionKey: supported,
  namedFunction: supported,
  "namedFunction.interior": supported,
  functionName: supported,
  anonymousFunction: supported,
  "anonymousFunction.interior": supported,
  name: supported,
  "value.assignment": supported,
  "value.assignment.removal": supported,
  "value.return": supported,
  "value.return.removal": supported,
  "value.collectionItem": supported,
  "value.collectionItem.removal": supported,
  statement: supported,
  ifStatement: supported,
  "condition.if": supported,
  "condition.while": supported,
  "condition.doWhile": supported,
  "condition.for": unsupported,
  "condition.ternary": notApplicable,
};

The strong typing would force us not to forget any scopes. The above would be the source of truth from which we could generate the following:

Support tables for #1642
Language support tables for scope pages in our docs
Examples of different aspects of support in our docs, using test cases

We could also check that test cases exist by adding an optional supportFacet field to our recorded test cases, and then checking in CI for a test case that mentions any support face that is marked as supported

pokey commented 8 months ago

Add "example" field with some quick inline example?

pokey commented 8 months ago

One major drawback is that if you add a new scope / facet in the process of developing a language, you then need to add a field to all existing languages. There are a couple of problems with that:

It increases the barrier to adding new languages / facets
You might not know whether a facet / scope makes sense for some language that is totally unfamiliar to you
If a couple such PRs are in-flight at the same time, you could get merge conflicts across all of the language files

A couple possible solutions:

Make the language support tables Partial
- Pros: solves all of the problems listed above
- Cons: we no longer get the benefit of being forced to think about the different facets of a given scope for every language
Add a new unknown value to the ScopeSupportFacetLevel enum
- Pros: solves 2) above, and still forces you to think about facets of scopes for languages
- Cons: Doesn't solve 1) and 3) above

pokey commented 8 months ago

We're leaning towards implementing the simplest possible version of 3) (structured representations) as our first step. It

wouldn't be required to declare the support table for every language
we could make the scope support tables Partial

Then we'd have the immediate benefit of:

PRs could signal their scope support in a structured fashion that could be enforced down the road
For languages which have these tables, we could see at a glance what's supported

Some drawbacks compared to starting with 1) or 2)

Given it's Partial there isn't a list of checkboxes indicating what should commonly be contributed in initial lang support. We could have a template that we use for this purpose, though
There aren't checkboxes for whether tests have been added for the given facet, and there's no enforcement that the tests exist. 1) would have such checkboxes

I guess we could add the scopeSupportFacet field to our recorded tests, and then only check it for those languages that have support tables. That's a bit more work though, and adding that field to recorded tests will be a bit annoying without also adding some tooling to make it easier.

pokey commented 8 months ago

@josharian I'd love your feedback on this one when you get a minute, either in comments here or at a meet-up. It seems there is a possible short-term win here but don't want to lead ourselves to trouble in the long-term

josharian commented 8 months ago

I can’t quite envision what the long-term trouble would be here.

I would say the primary goals are to (a) get anything at all working, and (b) minimize the effort required to update the table. So any design that requires O(n) entries be added, or altered, is strictly worse than one that requires O(1). So I guess I’m partial to partials.

I am really excited about this idea. I would love to have a place that I could very easily check every month or two to see what scopes Go is now missing, perhaps because something got added that I wasn’t aware of at the time.

josharian commented 8 months ago

Oh, and I’m not too worried about needing checkboxes. I suspect that most people who are implementing languages are going to refer to this table a lot and will be excited to update it.

AndreasArvidsson commented 8 months ago

I added that to discuss label because I might want to take a crack at this sooner than anticipated. I do have some questions about facets.

If something requires a different removal range or not might be very language dependent. Therefore I do wonder if not every single scope potentially could have a removal? And once we actually do proper scope test we would test the removal range of every scope.
Should iteration scopes be part of the facets?
branch.ternary that contains two branches. Would that be one or two facets? I would assume a single test would capture both branches.

cursorless-dev / cursorless

Language support tables #2010

The problem

The solution

1. Add support tables to our new language PR template

2. Manually add support tables to per-language pages (https://github.com/cursorless-dev/cursorless/issues/1642)

3. Add structured representation of language scope support