option to control white space between tokens and indentation

millermedeiros commented 12 years ago

I got to this project while discussing ways to do advanced code formatting for JS (source)

It would be great if it had the same level of control as FDT advanced code formatter - options to define how white space is handled between each kind of token (line break, space, none), indentation, keep/remove empty lines, etc.

Basically transform escodegen into a hardcore version of JSBeautifier.

michaelficarra commented 12 years ago

I agree, the format should be entirely configurable. When designing the configuration options, other projects that provide these code formatting options (like the ones linked above) should definitely be considered. This is also probably a good time to either design or find a small JS example which contains all syntactic features so that it may be used in testing the formatting.

Constellation commented 12 years ago

I agree, code format options are required. Do you have any ideas?

One idea is

escodegen.generate(AST, { format: {  options... } });

style.

michaelficarra commented 12 years ago

@Constellation: That sounds good. You should also move the indent and base options into format.

The sooner you get an API defined for us, the sooner we can submit pull requests to incrementally add the functionality. You should also probably document some of the formatting options you want to be available. I'd start by looking at those other code formatters' options.

edit: I suggest you define a defaults object with all the settings defined how they are fixed currently. Then we could modify the code generation to slowly start actually respecting the values in the defaults object and applying user overrides. Just mark the settings that aren't actually implemented (all of them except indent and base right now) as such.

millermedeiros commented 12 years ago

@Constellation @michaelficarra I agree, the base settings is really the way to go and a good way to document all the available options, track progress and implement new ones.

For me the ideal scenario would be allowing a before/after option for each token listed on the Syntax, Precedence and BinaryPrecedence objects (source), option to format how comma differs in ObjectExpression, ArrayExpression, VariableDeclaration, Arguments, how variables are declared (single var statement, multiple) and also if it should remove/keep empty line breaks (only possible if AST contains LOC info).

For now I could think in 2 diff approaches:

option 1 (loose)

Let the user define any char he wants to be placed before/after each token/expression.

escodegen.generate(AST, { format: {
    FunctionDeclaration_before : '\n\n',
    FunctionDeclaration_after : '\n',
    FunctionDeclaration_indent : true,
    CommaObjectExpression_before : '',
    CommaObjectExpression_after : ' ',
    // things like LogicalExpression would set the default for the whole category
    LogicalExpression_before: ' ',
    LogicalExpression_after: ' ',
    // and we could overwrite a single logical operator
    LogicalOR_before : '',
    LogicalOR_after : ''
} });

option 2 (booleans)

Divide the settings per kind (indentation, white space, braces, blank lines) and use booleans for every setting.

escodegen.generate(AST, { format: {
    blankLines : {
      FunctionDeclaration_before : true,
      FunctionDeclaration_after : true
    },
    indent : {
        // probably just block expressions require an "indent" option
        FunctionDeclaration : true,
    },
    whiteSpace : {
        CommaObjectExpression_before : false,
        CommaObjectExpression_after : true,
        // things like LogicalExpression would set the default for the whole category
        LogicalExpression_before: true,
        LogicalExpression_after: true,
        // and we could overwrite a single logical operator
        LogicalOR_before : false,
        LogicalOR_after : false
    }
} });

option 3 (mixed)

Maybe we could even support both formats (mixed at the same time) - we read the "shallow" options first than do a loop in the indent, whiteSpace and blankLines objects filling the gaps - if we follow the loose format it's easy to convert the booleans into the proper tokens (should take ~10LOC).

The "loose" format would enable more flexibility since we could even come up with special tokens for things like indent (eg. \n${%i-2}, which would mean line break + indent minus 2 chars), this could be the solution for things like weird comma-first styling (example) - note that return statement would be an issue if the user sets a line break before arrays and objects (behavior on var declaration is different than return statement).

michaelficarra commented 12 years ago

@millermedeiros: If option 1 was taken, I would prefer an object like {format: {before: { ... }, after: { ... }}}. But I like the mixed idea. An example:

escodegen.generate(AST, { format: {
  indent : {
      style: "  ",
      base: 0, // equivalent to String.repeat(indent.style, ~~0) under `indent`
      FunctionDeclaration: true, // equivalent to String.repeat(indent.style, ~~true) under `indent`
      Property: escodegen.specialFormats.COMMA_FIRST, // some constants for special styles
      SwitchCase: -1 // possibly even dedents through negative numbers
  },
  whitespace: {
    before: {
      CommaObjectExpression : false, // equivalent to "" under `whitespace`
      LogicalExpression: true, // equivalent to " " under `whitespace`
      FunctionDeclaration: "\n", // custom values
      LogicalOR : false,
    },
    after: {
      CommaObjectExpression : true,
      LogicalExpression: true,
      FunctionDeclaration: "\n",
      LogicalOR : false
   }
 }
}});

edit: made some minor additions, fixed a bug

Constellation commented 12 years ago

I would prefer object style too. Mixed style looks good. Scriptable style, something like callback function style, is future work.

millermedeiros commented 12 years ago

sounds like a plan! I think we should probably start with the basics (braces, operators) and then expand to the others, that way we can validate the first ones and make sure we are following the right approach and that it's "scalable".

I'm thinking about a new method similar to addIndent() and parenthesize() that accepts the content and type, like:

function addWhitespace(content, type) {
  return _whitespaceBefore[type] + content + _whitespaceAfter[type];
}

Splitting the before/after into individual objects is probably a good idea, code will be cleaner and configuration can also be easier. I was keeping it as a shallow object with _before/_after to be able to group things by proximity (imagine a configuration file with all the options - probably 80+ lines) but a quick search is more than enough and will end up being more organized anyway, so my vote is to use @michaelficarra model and skip constants and special indentation for now.

I think we can start a new test suite just for the indentation and add individual tests as we add new configuration options, no need for a file with all the options for now, specially since we just need to test the AST to JS conversion. I would probably create a file with the source file, one with the AST (create a script to generate it automatically) and one with the expected output, that way it will be easier to compare it and add new tests (like yuicompressor does with css).

Constellation commented 12 years ago

Initially, I created format.indent section and move indent / base to it. https://github.com/Constellation/escodegen/issues/6

@millermedeiros

YUICompressor test YUI Compressor Test system looks very good! So I'll created new issue for it!

goatslacker commented 12 years ago

anyone working on this?

michaelficarra commented 12 years ago

@goatslacker: I will be starting in around a month, possibly sooner.

edit: I am hoping to base my work on this: http://oai.cwi.nl/oai/asset/10876/10876D.pdf

goatslacker commented 12 years ago

Peachy. Looking forward to it. Is this part of your redux project?

RGustBardon commented 12 years ago

It would be reasonable if the options in question would not take precedence over options.format.compact.

jfsiii commented 11 years ago

Is anyone currently working on this? @michaelficarra?

Constellation commented 11 years ago

Some formating options are now available.

satazor commented 11 years ago

@Constellation I don't think it is working right. Here's an example:

dejavu.Class.declare({
    method2: function () {}
});

Generated an ast and then passed to escodegen and here's the output:

dejavu.Class.declare({method2: function () {
    }});

Constellation commented 11 years ago

Ah right, agreed. I'll fix it, thanks!

Constellation commented 11 years ago

@satazor: I've fixed it, thanks! 4e610a28fdfd0ec9f175a5f3f83189197e669cb5 c5129de8a89af0a9432327f38ae7725e2abd021f

satazor commented 11 years ago

@Constellation any chance to have this landed on npm?

Constellation commented 11 years ago

@satazor: I'll publish it :)

edit: publised as 0.0.14

satazor commented 11 years ago

@Constellation There is still some strange things, but they are hard to explain. Can we talk via gtalk or something?

millermedeiros commented 11 years ago

This week I did some work with AST and it made me think that escodegen is not going to be the best tool for code formatting. It doesn't use the source tokens (specially white spaces) and the changes will be destructive - since it is a full rewrite of the code (which can be a good thing in some cases).

I'm also feeling that this goes beyond the responsibility of escodegen. Simple settings are OK, especially the ones related to the code generation itself (asi, line breaks, indent) but I don't think it is worth increasing escodegen complexity too much. Would be better to create a separate tool that did the non-destructive formatting and bundle both tools if that makes sense, the same way that esmangle is a separate project.

AST => escodegen => string => esformatter => string

What do you guys think?

PS: I might start such project (non-destructive formatter), just need to find some time.

michaelficarra commented 11 years ago

@millermedeiros: I have no idea what point you are trying to make.

millermedeiros commented 11 years ago

@michaelficarra my point is that my original feature request of adding granular control about the code output style isn't really a good idea. It would be better to focus the effort on other things and create a separate tool/lib to do the code formatting, that way escodegen is responsible only for generating the basic structure.

michaelficarra commented 11 years ago

I disagree. This is exactly the place for it. What would this other tool do? Use esprima to parse escodegen output to the exact IR escodegen operates on and then do exactly what you're suggesting here? That would make no sense.

Constellation commented 11 years ago

@satazor Sorry for very late reply. Is it possible to paste some examples? Probably, it is easy to understand problem :)

@millermedeiros, @michaelficarra Hm, structured data information (such as AST) isn't necessary for esformatter? Because escodegen generates raw text data, so formatter needs to treat raw text. If structured data information is needed, I think this job belongs to escodegen.

satazor commented 11 years ago

@Constellation yes.

Original code: https://github.com/IndigoUnited/dejavu/blob/master/test/specs/functional.js Code outputed after some transformations: https://github.com/IndigoUnited/dejavu/blob/master/test/specs/functional_optimized.js

As you can see, escodegen does not preserve new lines, semi-colons and other things correctly.

millermedeiros commented 11 years ago

@Constellation a tool like esformatter would need an AST as well but the most important would be the raw string input (since it contain information about line breaks, original tokens and white spaces). I think the formatter should be able to apply the transformations without compromising the source (non-destructive changes), if the user have some missing semicolons it shouldn't automatically add/remove them (unless explicitly set), the same thing for empty lines, trailing white spaces, etc. The AST generated by Esprima doesn't contain info about white spaces and escodegen doesn't use the source tokens to generate the output. - The output is probably going to be very different from the input in many cases.

I just wanted to let you guys know that while having the settings would be something good it won't solve all needs since in some cases the user might want to apply only non-destructive changes and that having so many granular settings might be out of the scope of escodegen and make the code more complex than it need to be.

millermedeiros commented 11 years ago

I started the work on esformatter and just pushed it to github and npm: https://github.com/millermedeiros/esformatter - I also wrote a separate lib (rocambole) to do the recursive walk and generate the extra tokens for comments, line breaks and white spaces. Trying to make it as less destructive as possible.

All the tests inside test/compare/default are working as expected but I'm still missing support for many important statements like if/else, while, for, etc...

Maybe now it will make sense for you guys why the non-destructive approach is actually better for code formatting since it will avoid inserting any token that isn't a white space or line break (if the user wrote the code that way he probably have a reason to). I plan to add automatic semicolon insertion and also add options to toggle line wrap and if it should keep existing line breaks and white spaces, but of course this will be future work, focusing on the basics for now.

Contributors are welcome. Cheers.

goloroden commented 10 years ago

Any update on the suggestions of @millermedeiros yet? Is there any work going on in this area? If not, is it still of interest that something is going to happen there?

millermedeiros commented 10 years ago

@goloroden esformatter is getting closer to being "usable", only problem is that I always get busy with other projects. I would not recommend adding too many settings to escodegen, complexity reaaallly piles up.

asdf23 commented 8 years ago

I would like to have the code formatted with leading commas. Such as:

var a = ({
     a: "Prefixed with tab then space"
    ,b: "Prefixed with tab then comma"
    ,c: "Prefixed with tab then comma"
});

estools / escodegen