benjamn / recast

JavaScript syntax tree transformer, nondestructive pretty-printer, and automatic source map generator
MIT License
4.98k stars 348 forks source link

Recast needs additional maintainers #1086

Closed conartist6 closed 2 years ago

conartist6 commented 2 years ago

@benjamn I am trying every way that might be possible to get your attention, because if I can't I'm forking the library.

conartist6 commented 2 years ago

OK I'm back-ish after some distractions. I've also changed tack a bit on how I plan to integrate with prettier. The new plan is to use exactly the existing APIs that prettier has for output. In other words, I expect prettier essentially to be able to do this:


const {
  __debug: { formatAST },
} = require("prettier");

const formatCst = (cst) => {
  // Note: formatAst mutates the AST to rebalance binary expressions.
  const formattedText = formatAST(cst);

  rebuildTokensFromText(cst, formattedText);
}

That's great because that part at least doesn't require any changes from prettier at all. They even export the formatAST method that works the way I need it to!

Then the remaining challenge is the one that can't be avoided: writing prettier's language rules to query for tokens against the CST. I haven't really started digging into that yet, but I have heard at least one contrib there express enthusiasm for the idea of that change which is encouraging.

conartist6 commented 2 years ago

I've got it! I think. It took a while to get to this design, but now that I'm here it feels pretty powerful and correct. I'm using coroutines -- one side of the coroutine is the grammar generator for a given node, and on the other side we have an engine of some kind, either for validation, generator, or mutation.

The grammar generator yields simple commands to the engine like takeMatch, take, match, and drop. The use of these commands completely decouples the grammar concerns from the details of the task the grammar is being used for, while ensuring that all the functions involved in the coroutine are being evaluated in lockstep. This allows me to fulfill my basic goal of driving a variety of actions through easy-to-specify imperative grammar.

I've punted on the question of how to attach whitespace and comments by saying that it will be up to the plugin ecosystem to define a comment association strategy. I'll always hoist whitespace and comments up in the tree, user code can insert whitespace and comments wherever it wants, and plugins can be uses to systemically push comments downward. For example you might use plugins to push typecast comments or doc comments down into the nodes they describe, so that they are sure to move with those nodes if those nodes move. In this way I ensure that the most practical use cases can be served without endorsing any particular set of rules.

I have a working implementation of the engine that consumes the grammar to build up node.tokens for each node in an AST. Now I need a generative engine that merges existing valid node.tokens tokens with emitted tokens. Finally I'll need to build some kind of API that prettier will be able to use to do linear token lookarounds and on-the-fly normalization during a pathful tree traversal. I don't yet see any reason why this shouldn't be feasible...

conartist6 commented 2 years ago

I think I won't have to mess with the prettier language grammar either, which is amazing. I only have to define my own grammar, and I'm quite liking the syntax I've developed for expressing those definitions.

conartist6 commented 2 years ago

There's still a ton of stuff to do, but I'd say that after three rewrites my architectural prototype is now complete. This is the design that I will move forward with!

I'm going to rebuild my architecture docs, but for now here's the most interesting part of the grammar definition:

const visitors = {
  // This generator is executed by the `match` coroutine to build a `matchNode` for `node`
  *ImportSpecifier(node, { matchNodes }) {
    const { local, imported } = node;

    // These next three lines are equivalent to `` take(ref`imported`) ``
    // We use `match`/`emit` instead of `take` in order to get `importedRef`
    // `match` does recursive submatching, and returns to us an array of matched tokens
    // ref tokens can be used to get match nodes, e.g. `matchNodes.get(importedRef)`
    const importedMatch = yield match(ref`imported`);
    const importedRef = importedMatch[0];

    // Causes the coroutine to add these tokens to `matchNode.tokens`
    yield emit(importedMatch);

    if (local.name !== imported.name) {
      yield take(_, ID`as`, _, ref`local`);
    } else {
      const asMatch = yield match(_, ID`as`, _, ref`local`);
      const localRef = arrayLast(asMatch);

      // Ensure that `foo as bar` becoming `foo as foo` only emits `foo`
      const valid = !matchNodes.get(importedRef).fallback && !matchNodes.get(localRef).fallback;

      if (asMatch && valid) {
        yield emit(asMatch);
      }
    }
  },
}
conartist6 commented 2 years ago

OK, so I've had one last revelation. That grammar that I just shared is sufficiently abstract that it's actually compatible with prettier's grammar definitions! That means that instead of maintaining a complete grammar for recast and a complete grammar for prettier, there can be a single next-gen grammar and tool (maybe called prettier, maybe not) which is capable of printing just about anything -- pretty printing, reprinting original whitespace, and any blend of the two. This would largely eliminate the need for a separate repository with the responsibilities that recast has had until now.

Since my final answer to the question as to what to do about the difficulties maintaining recast is "heal the fork with prettier" I am going to close this issue and track further efforts elsewhere, for now on https://github.com/prettier/prettier/issues/12806

conartist6 commented 2 years ago

Update: I think for a start I'll make a grammar independent of prettier's, and then shift towards maybe merging grammars with prettier as a potential long term goal. But for the moment I need to focus on my life, so there probably won't be too much progress in the near future.

conartist6 commented 2 years ago

Juuuust kidding, I've still been working on cst-tokens. After the 4th (ish) big refactor, I think the core architecture is now likely to be pretty stable.

This last refactor has allowed me to do some really cool stuff. For example, I no longer need to include whitespace in the definition of my javascript grammar at all. The parser simply understands that whitespace is necessary between a keyword and an identifier (but not between a punctuator and an identifier).

I have some work left to do at this point, but I anticipate that I'll soon be have a web playground I can point folks to to try out what I've built and see how it allows them to transform code. I'll also be able to showcase the power of integration with prettier including the things you just couldn't do before, like insert a blank line or even control block expansion from inside a transform!

conartist6 commented 1 year ago

This project got huge -- I've been working on it (instead of having a job) this entire time. It certainly is intended to replace recast, but it's also now meant to replace major parts of babel and eslint and prettier as well. The architecture turned on its head yet again, making the grammar look much more like a parser. Just for fun the ImportSpecifier snippet I had shared previously now looks like this:

const visitors = {
  *ImportDeclaration() {
    yield eat(KW`import`);

    const special = yield eatMatch(ref`specifiers:ImportSpecialSpecifier`);

    const brace = special ? yield eatMatch(PN`,`, LPN`{`) : yield eatMatch(LPN`{`);
    if (brace) {
      for (;;) {
        yield eat(ref`specifier:ImportSpecifier`);

        if (yield match(RPN`}`)) break;
        if (yield match(PN`,`, RPN`}`)) break;
        yield eat(PN`,`);
      }
      yield eatMatch(PN`,`);
      yield eat(RPN`}`);
      yield eat(KW`from`);
    }

    yield eat(ref`source:StringLiteral`);
    yield eatMatch(PN`;`);
  },
};

Also I should say at some point this project turned into the core of a new IDE. I'm hoping to put together a conference talk once I have some of the coolest demos I know how to create working properly.

EDIT: Sorry that snippet is ImportDeclaration but I had shared ImportSpecifier previously, which now looks like this:

v = {
  *ImportSpecifier() {
    yield eat(ref`imported:Identifier`);
    yield eatMatch(KW`as`, ref`local:Identifier`);
  },
};