CS2 Discussion: Question: New Compiler Starting Point

GeoffreyBooth commented 8 years ago

Here are the viable options for which compiler to use as our starting point for outputting ESNext:

The current CoffeeScript 1.x compiler, with its current string-generation approach
A port of Decaf implemented in the CoffeeScript compiler, which would pass a node tree to Babel for code generation, with the legacy compiler as fallback
The CoffeeScript Redux compiler, probably from this PR, which also takes an AST approach but has been stale for two years

The current compiler obviously has the most compatibility of any CoffeeScript compiler, and soon it will support modules and hopefully also classes, the two biggest ES2015 features. But it’s a bear to work with, with a brain-bending codebase, and strings are brittle. Redux is easier to follow, but it doesn’t have full support for all current CoffeeScript, much less ES2015 (though that PR has added support for many ES2015 features). The Decaf/AST approach with fallback is perhaps the easiest, but then we have two code-generation pipelines in our codebase.

The choice boils down to a technical one: should we stick with the current compiler’s string-generation approach, despite its flaws? Or is Redux close enough to production-ready that it’s easier to fix whatever incompatibilities it may have, and move forward from there? Or is the piecemeal, “implement what we can with fallback for everything else” pattern offered by the Decaf approach the path of least resistance?

2016-09-13 Edit: Added Decaf.

lydell commented 8 years ago

The problems with the current CoffeeScript compiler are:

It is full of hacks.
It is too permissive: It accepts more input code as legal than was intended, while still outputting valid JavaScript. This has led to disagreements on what is legal CoffeeScript and what is not.

The problems with Redux are:

It implements features up to CoffeeScript 1.6, so there’s a bit of catching up to do there.
Since it hasn’t received active maintenance for some time, some of its dependencies are out-of-date. For example, pegjs, which is used for the parser (that is, a very important component ;) ) is on version 0.8.0. I remember when pegjs 0.9.0 came out. I then tried to upgrade, but failed and give up. The latest pegjs version right now is 0.10.0. Why did I fail and give up the upgrade? Because of the next point.
pegjs does not support parsing languages with significant indentation. Redux therefore uses a hack to get around that limitation. This is “the only (big) flaw” in Redux, IMO.
Because the original compiler is (as mentioned) too permissive, Redux fails to parse some people’s programs, because Redux many times have chosen more sensibly (IMO) what is allowed and what is not. And sometimes it is not even clear how the original compiler works. This is mostly about indentation, if I remember correctly.

The nice things with the current CoffeeScript compiler are:

It gets the job done.
There’s coffeelint.

The nice things with Redux are:

pegjs is super nice to work with.
It has a CoffeeScript AST.
The compiler is basically just a list of rules mapping CoffeeScript AST nodes to JavaScript AST nodes.

Note: I’ve probably forgotten lots of things here.

Here’s what I think about time and effort:

It will be much faster to update the original CoffeeScript compiler to accept more ES2015+ syntax and output more ES2015+ syntax, than to get Redux up-to-date – initially. Let’s imagine that two teams started working at the same time on the ES2015+ goals of the CoffeeScript language, one on the original compiler and one on Redux. At first, the team working on the original compiler would make much more progress. But then, after some unknown amount of time, I think that the Redux team would be able to implement features and bug fixes faster – and having more fun while doing so. It’s a bit like those school book math questions where you have to choose between two phone payment plans: One with a small monthly fee but a large fee per phone call, and one with a large monthly fee but a small fee per call. If that makes any sense :)

What would I choose, then? I actually have no idea. One second I lean towards the original compiler, the next I want to go with Redux. Then I’m like “whatever, I’ll focus on Elm and PureScript instead,” followed by wanting CoffeeScript again. Either way, I can’t see myself doing major contributions to CoffeeScript in the foreseeable future, but I’ll probably stick around trying to help.

rattrayalex commented 8 years ago

One second I lean towards the original compiler, the next I want to go with Redux.

Same.

What do we think would be the healthiest for the broader community? Which option would the current coffeescript contributors and maintainers like to see?

GeoffreyBooth commented 8 years ago

Or maybe the question we should be asking is, do we have the resources to finish Redux? How many developers are really interesting in actually committing hours of work to this project? How many hours can they promise?

Redux seems like the better long-term solution, but if we don’t have enough developer-hours to even get Redux up to parity with CoffeeScript 1.10, then we will burn through whatever hours we have available to us advancing Redux a little bit but still leaving it in the unfinished state it’s in now. It’s risky to start with Redux unless we know with some certainty that we won’t run out of resources before Redux becomes a viable option.

rattrayalex commented 8 years ago

That's a very good point @GeoffreyBooth

Asc2011 commented 8 years ago

thx @GeoffreyBooth for giving this overview @lydell for sharing his experience.

I'm not sure if the current tooling provides solid foundations for the future of CS6 ? Having to choose between working with a nice codebase but a 'patched' pegJS or the complicated 'classic'-CS codebase. While looking out for a parser-toolkit that explicitly supports indentation grammars i found pythonparse worth considering. Having ~700 LOC it is able to parse Flask and Django and comes with very precise debugging info during grammar devel. And it compiles into JS using ScalaJS. If anybd screams 'heresey' now, i'm ok with that. But i believe the decision on the future-tooling to be a major one. So it should be based on features and options one gains or looses. As things are, many changes to JS already arrived and more are drafted, in discussion and soon to come. This wont stop anytime soon. The tooling should enable a team of developers to get into the status-quo of CS6 quickly and work in paralell, e.g. one dev on one proposal. The 'classic'-CS-codebase, seems to be too complicated to attract new devels. The Redux-code, well is based on a 'hacked'-parser toolkit, that does not support indentations grammars (yet, maybe in the future ?). Scalas parser-toolkits have a high reputation - Neo4J/Cypher was done with one of it. I believe Fastparse to be a interesting candidate for parsing indentation grammars - in a "hack free"-way (Li, the author). Breaking changes to CS will come as sure as 'winter is coming'. I'd be happy to see a 'Classic'-CS that handles new-style classes and -modules. Maybe its a good idea to build a list of prominent CS-based projects that should/must compile with any new CS-parser ? And explicitly declare this a criteria to meet. In the long run the devels should decide if the right tooling can provide a decent community-workflow in the future. I remember D. Crockford once spoke about a major-project and he said that after month of team-discussion, debate and development they re-started work from scratch and finished the same task within 2-weeks - because evbd. was clear about what had to be done. I sometimes wonder if there is too much debate&discussion&opinion and a lack of decision at the same time. That obvoiusly led to the situation where on finds six-CS-alike languages - which is a waste of resources and enthusiasm that should not be repeated. I'd like to hear the way you see it and if changing the tools might speed-up developments and make it more fun ?

eventualbuddha commented 8 years ago

(Caveat: I was once a fan of CoffeeScript but have since decided it's better to go with JavaScript + Babel. I created decaffeinate to migrate a large codebase away from CoffeeScript, so keep that in mind when considering my suggestions.)

My experience with both the official CS parser/generator and Redux left me with similar impressions to what others have said on this thread. The AST generated by Redux is much nicer and the project is structured in a more sane way. It does not have the compatibility needed to support major CoffeeScript codebases, however. Of the roughy 3000 files in the codebase I work in, 20% or so did not parse correctly using CSR.

I originally built decaffeinate on CSR, but there were tons of issues related to compatibility so I switched to the official parser. However, I did so by creating decaffeinate-parser which maps the AST from CS to one that much more closely resembles CSR's AST. That way I didn't have to rewrite very much code in decaffeinate.

Another issue that both parsers have is bad location information. Since decaffeinate relies on it quite heavily (it edits the original source code based on node and token locations), I found tons of bugs in CS's location information for AST nodes and tokens. I worked around a lot of these by creating my own lexer, coffee-lex, which is much more faithful to what is in the original source code than CS's lexer and has accurate location information. Even with that, I eventually had to fork CS to start addressing these bugs rather than trying to work around them. That process is ongoing.

Ironically, some of these tools that I created to help hasten the migration away from CoffeeScript may also be useful in a project such as you all are discussing. I have no problem with them being used in that way if you find them useful, just keep in mind that my goals probably differ from yours. If I were in a position of wanting to continue to develop CoffeeScript into the future, I would probably do this:

Favor compatibility, or at least have a migration tool that makes code compatible with whatever the new parser does.
Build the base tools in such a way that an ecosystem can be built on top of them. IMO, this was a failing of CoffeeScript due to the complexity of its codebase and the utterly terrible metadata (tokens and AST).
Have exhaustive testing that ensures compatibility (or re-writeability) with existing large CoffeeScript codebases.

Best of luck! 😉

abedra commented 8 years ago

I think it is important to remember some of the goals of Redux. It attempts to correct a lot of unintended, undefined, and ambiguous behavior of the original compiler. The implementation demonstrates a view of what should happen but not necessarily the best or most accepted view. It also explicitly breaks CS programs that function under the standard compiler.

While I have a bias towards Redux from an implementation and overall correctness standpoint I think there would have to be additions made that support the behavior of the standard compiler (maybe a strict mode that operates on Redux semantics).

Given that caveat I think it would be easier to work from the Redux codebase. It's much less tangled and has some nice AST features and optimization hooks that make for easy tooling entry points, not to mention being a bit cleaner as others have mentioned. The option of simply starting over and taking both implementations as advice should also be considered. I don't think either of them truly represent the right solution for everyone IMHO.

eventualbuddha commented 8 years ago

The option of simply starting over and taking both implementations as advice should also be considered.

I'd second this, and in particular you should investigate not using a parser generator. Writing a parser by hand is not that terrible.

JimPanic commented 8 years ago

From a user point of view the best option is a series of non-breaking changes to harden the lexer/rewriter/parser followed by graceful (and also tool-supported?) but breaking changes. I sincerely doubt just going with Redux is a good idea. Not because of its quality or how well it was maintained, simply because it does break too much too soon.

I think refactoring the lexer and explicitely allowing ambiguties while outlining them in code for future reference is a better way to go. The ultimate goal should be to get to a state of clean code like Redux in a graceful way.

However, I do not see a complete rewrite as a viable option. The risk to not get adopted by a properly sized portion of the community is too high, seeing how Redux basically (was politically) ran into a dead-end.

GeoffreyBooth commented 8 years ago

@eventualbuddha thank you for your detailed comments. Any reason why you don’t submit your location bugfixes back to the main repo? Presumably the main repo would benefit from those fixes.

I’ve opened a new issue to gauge how much support we have for this effort. It sounds like starting from Redux is more work but with a bigger reward, but not something we should attempt unless we know we have enough people with enough time to achieve it. Hopefully that issue answers that question.

I also think most users don’t care all that much about CoffeeScript’s output. They want to use modules and classes and const and async/await and a few other cool new features, and if CoffeeScript outputs ESNext syntax for everything else too then that’s gravy. No one but us cares about the quality of the CoffeeScript compiler’s codebase; that only matters relative to however much it enables or impedes future development. If the resources available to us are low, like maybe me plus one or two other people, we could probably at least implement modules, classes and const in the current compiler. If we have more contributors, a compiler rewrite or finishing Redux might be on the table (and which might be a prerequisite for outputting ESNext syntax). But even if it turns out the resources available are on the low end, I think we should still be able to achieve the most important goal of most end users, of adding support for the few most desired ES2015+ features.

alangpierce commented 8 years ago

@GeoffreyBooth I wrote both of the CoffeeScript bug fixes for decaffeinate (full changelog is here: https://github.com/decaffeinate/coffeescript/commits/decaffeinate-fork-1.10.0 ). The first fix is already an open PR ( https://github.com/jashkenas/coffeescript/pull/4291 ) and the second one I have been meaning to submit, and just got around to it ( https://github.com/jashkenas/coffeescript/pull/4296 ). Certainly the intention is to be friendly and send fixes upstream. :-)

We work off the 1.10.0 tag instead of master (since the AST format is a little different), although in both cases it was a clean cherry pick. It's also a little weird because in both cases, I think the bug didn't cause any correctness issues in CoffeeScript itself, just in the intermediate AST. But probably cleaning that up is still useful for future maintainability.

rattrayalex commented 8 years ago

Writing a parser by hand is not that terrible.

@eventualbuddha quite a statement. Can you link to an example?

GeoffreyBooth commented 8 years ago

@eventualbuddha also is this codebase you tested with Redux publicly available? I’d love to have a good serious app to use to test with.

lydell commented 8 years ago

Here's an idea:

I might be missing something, but once import/export is done in the original compiler, there won't really be any/much changes to CoffeeScript's syntax needed for a while, will it? I'm thinking we'll mostly change what the language compiles to – for example, compiling classes into something that plays better with ES2015 classes.

If so, it might make sense going with the original jashkenas/coffeescript code.

We could start by improving the output of the parser, so that we get a real/better CS AST. We could start replacing the current compiler (nodes.coffee) with something better – perhaps we could pull in parts of the Redux compiler as a starting point. It would be cool if a new compiler could only compile classes to start with, and fall back to the old compiler for all other nodes.

Then we could leave replacing the lexer+parser as a longer term goal. As long as the new parser outputs the same AST, the same compiler can be used.

This way would could replace parts of the current compiler incrementally.

carlsmith commented 8 years ago

Writing a parser by hand is not that terrible.

@eventualbuddha quite a statement. Can you link to an example?

Yep!

TDOP makes it much easier than you would think. Pratt is a genius, but nobody listened to him, until Crockford began popularising his algorithm. It really isn't that difficult with TDOP. I've never seen any other approach that wasn't really intimidating, but Pratt parsing only takes an evening to figure out, and then you can write parsers by hand whenever you like. It's an excellent investment.

carlsmith commented 8 years ago

I linked to the Python example, because I found it the easiest to follow [by far]. For completeness, here's a copy of the original paper we remastered on GitHub, and Crockford's article.

GeoffreyBooth commented 8 years ago

@lydell do you have the time to create a new branch that shows a starting point of how replacing nodes.coffee would work? Maybe implement one of the small features using the new nodes, and add a compiler flag so we can choose the old or the new nodes; or maybe the new nodes is used automatically for features it supports, and everything else is handed off to the old nodes, if they can work together that way. If you can show us how this could work, we could use your branch as the new-compiler starting point.

@carlsmith or others: if you want to create a new compiler from scratch, do you mind creating a repo that demonstrates the beginnings of how such a new compiler would work? At least implement significant whitespace and variable assignment, for starters. If you can get that working in a matter of days, and it seems like implementing all the rest of CoffeeScript’s current syntax would be within reach within weeks, then the new repo could be a contender as our new-compiler starting point.

And if anyone has the time to play with Redux a little bit and get past where they left off with it—struggling to upgrade its dependencies—that would be very useful too. It still feels to me like the most promising starting point, but not if it’s stuck on an unsurmountable obstacle. If Redux can get updated, and maybe we run CoffeeScript’s current tests against it to know what feature support it lacks, we can see how suitable it would be as a starting point.

If none of these experiments bear fruit, the backup plan is to keep the original compiler and gradually work to improve it. We know that that’s an option, though not as satisfying as replacing its hairiest parts with cleaner and more-extensible code.

carlsmith commented 8 years ago

I'm busy doing a website, but that'll be up soon, and have other projects I'm committed to, but will try and find time for doing a parser. Can probably reuse code from the one I have, strip it down to its core, and tidy it up, so we'd have a simple parser that people can understand, that we can all chip away at from there. Once you've got the recursive decent logic, with precedence and support for goofy (right to left) operators, everything else is lots of fairly local, well defined, little problems that just need working through one by one. The parsing algorithm is the tricky part.

carlsmith commented 8 years ago

Just to be clear, I think what you are doing with the existing infrastructure is the best bet, but I can personally probably be more helpful working on the longer shot.

mrmowgli commented 8 years ago

I mentioned this in the chat, and I think it's worth bringing up here, but I wonder if we could run the original CS test suite against Redux and just see what fails? That should tell us where the Redux code needs to be updated. In addition perhaps if we are putting together a custom compiler we could use the test suites from both projects to validate progress. Just a thought.

michaelficarra commented 8 years ago

I think that one of the most valuable outputs of the CoffeeScriptRedux project was the wiki pages which documented information for implementors:

GeoffreyBooth commented 8 years ago

@michaelficarra thanks for commenting! Do you mind sharing with us your opinion for how much effort it would take for:

Redux to catch up to CoffeeScript 1.10 features, aside from the intentional deviations?
Redux to support essential ES2015 features, presumably by finishing #344?
Redux to output ESNext syntax for most features?

GeoffreyBooth commented 8 years ago

Has anyone looked into Decaf as an option?

GeoffreyBooth commented 8 years ago

So . . . I think we should investigate Decaf 😄

It’s just one file (plus a CLI that calls it): the 1506 sloc parser.js. It’s so simple, it doesn’t have a need for your fancy classes or psuedo-classes or object-oriented programming; it’s all plain-declared functions. Its entrypoint function is compile:

export function compile(source, opts, parse = coffeeParse) {
  const doubleSemicolon = /\;+/g;
  opts = opts || {tabWidth: 2, quote: 'double'};

  const _compile = compose(
    // hack because of double semicolon
    removeDoubleEscapes,
    compiledSource => Object.assign({}, compiledSource, {code: compiledSource.code.replace(doubleSemicolon, ';')}),
    jsAst => recast.print(jsAst, opts),
    insertSuperCalls,
    insertBreakStatements,
    insertVariableDeclarations,
    csAst => transpile(csAst, {options: opts}),
    parse);

  return _compile(source).code;
}

So yeah, those 1500 lines are pretty terse. The arguments in compose are read from the bottom up; each function’s return value is passed as the first parameter of the previous function on the list (so parse returns something passed into transpile as the parameter csAst, etc.).

Anyway the gist of what it’s doing, as far as I can tell, is using CoffeeScript’s own parser to generate a syntax tree, which it then does some transformations on before converting it into a JavaScript syntax tree, which has lots of ESNext syntax. It’s basically one long function that winds its way around parser.js, with a few big libraries imported and doing the hard work.

For things it can’t parse, it falls back to the CoffeeScript compiler’s output. This is Decaf’s great advantage over Decaffeinate: it can process all CoffeeScript files, always generating valid JavaScript, though some of the JavaScript might be the ES5 output by CoffeeScript rather than the ESNext output by the various libraries imported into Decaf. (Yeah, I neglected to mention: Decaf is another tool whose purpose is to convert CoffeeScript files into ESNext, just like Decaffeinate. Hence it’s written in ESNext, of course.) Decaf has a very impressive list of CoffeeScript features/node types that it can parse; the most notable omission is comments, which is ironic since that’s the one node type we wouldn’t want to output. One node type in particular it does support, however, is classes:

class Animal
  constructor: (@name) ->

  move: (meters) ->
    alert @name + " moved #{meters}m."

class Snake extends Animal
  move: ->
    alert "Slithering..."
    super 5

becomes

class Animal {
  constructor(name) {
    this.name = name;
  }

  move(meters) {
    return alert(this.name + (" moved " + (meters) + "m."));
  }
}

class Snake extends Animal {
  move() {
    alert("Slithering...");
    return super.move(5);
  }
}

Anyway it would be sloppy if CoffeeScript 2.0 was simply a Decaf wrapper over the current CoffeeScript compiler. We should untangle the logic of what parser.js does, then rewrite it into better-organized CoffeeScript and integrate it into the current compiler’s codebase. But the point is that Decaf’s approach might be leading us the way we want to go: generating nodes to pass to some other tool to convert into JavaScript, rather than us assembling the output string ourselves.

GeoffreyBooth commented 8 years ago

@lydell @rattrayalex @JimPanic @carlsmith others:

Looking at the list of features that we want to update to output ESNext, I wonder if a new compiler is the best approach? The list is surprisingly short, and many of the items don’t strike me as huge tasks (like compiling default parameters to default parameters, or => to =>). The only “big” change is classes; would it really be less effort to implement a new compiler, even starting from a base like Redux or Decaf, than it would be to rewrite the current classes code?

Don’t get me wrong: I love the idea of a new compiler, especially one that passes an AST to Babel to generate JavaScript; but I wonder if it’s the best way to get to 2.0.0. We could release 2.0.0 first, then refactor the compiler afterward if people have the motivation. A better compiler will certainly make it easier to implement whatever ES2016 and ES2017 and so on throws our way going forward.

We should make a decision soon so that @greghuc can implement template literals and hopefully someone volunteers to take on classes 😄 From what I’ve seen on this thread so far, it seems like we’re drifting by default toward extending the current compiler.

rattrayalex commented 8 years ago

I agree that the best path forward at this point is to stick with the jashkenas/coffeescript codebase with a 2.0.0 target.

I think other efforts are worth thinking about for possible long-term adoption but would be a distraction to invest in now.

JimPanic commented 8 years ago

I really am not sure tbh. I looked a bit deeper into the current code base last week and I think in the long run, we'd benefit a lot from a different approach lexing, parsing and transforming the AST in multiple phases even. The structure right now is quite coupled.

I'd still like to see a gradual transition towards a new approach, though, whatever that might be in the future. Mostly because (I suppose) nobody knows all the quirks and undefined implementation-defined behavior, but also to give users a chance to catch up on these very probable, small behavioral changes. Otherwise the upgrade path seems too steep tbh.

On 19 Sep 2016, at 05:13, Alex Rattray notifications@github.com wrote:

I agree that the best path forward at this point is to stick with the jashkenas/coffeescript codebase with a 2.0.0 target.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

lydell commented 8 years ago

I second what @rattrayalex said.

greghuc commented 8 years ago

Re new compiler, I agree with the pragmatic approach mentioned by @rattrayalex. However, I think there is benefit in a more decoupled, better structured compiler if it's easier to understand and so make changes. This might reduce the barrier to entry for people making commits to Coffeescript.

GeoffreyBooth commented 7 years ago

Closing as we’ve decided to build atop the 1.x compiler.

coffeescriptbot commented 6 years ago

Migrated to jashkenas/coffeescript#4923

coffeescript6 / discuss

CS2 Discussion: Question: New Compiler Starting Point #25