Open millermedeiros opened 12 years ago
I agree, the format should be entirely configurable. When designing the configuration options, other projects that provide these code formatting options (like the ones linked above) should definitely be considered. This is also probably a good time to either design or find a small JS example which contains all syntactic features so that it may be used in testing the formatting.
I agree, code format options are required. Do you have any ideas?
One idea is
escodegen.generate(AST, { format: { options... } });
style.
@Constellation: That sounds good. You should also move the indent
and base
options into format
.
The sooner you get an API defined for us, the sooner we can submit pull requests to incrementally add the functionality. You should also probably document some of the formatting options you want to be available. I'd start by looking at those other code formatters' options.
edit: I suggest you define a defaults object with all the settings defined how they are fixed currently. Then we could modify the code generation to slowly start actually respecting the values in the defaults object and applying user overrides. Just mark the settings that aren't actually implemented (all of them except indent
and base
right now) as such.
@Constellation @michaelficarra I agree, the base
settings is really the way to go and a good way to document all the available options, track progress and implement new ones.
For me the ideal scenario would be allowing a before/after option for each token listed on the Syntax
, Precedence
and BinaryPrecedence
objects (source), option to format how comma differs in ObjectExpression
, ArrayExpression
, VariableDeclaration
, Arguments
, how variables are declared (single var statement, multiple) and also if it should remove/keep empty line breaks (only possible if AST contains LOC info).
For now I could think in 2 diff approaches:
Let the user define any char he wants to be placed before/after each token/expression.
escodegen.generate(AST, { format: {
FunctionDeclaration_before : '\n\n',
FunctionDeclaration_after : '\n',
FunctionDeclaration_indent : true,
CommaObjectExpression_before : '',
CommaObjectExpression_after : ' ',
// things like LogicalExpression would set the default for the whole category
LogicalExpression_before: ' ',
LogicalExpression_after: ' ',
// and we could overwrite a single logical operator
LogicalOR_before : '',
LogicalOR_after : ''
} });
Divide the settings per kind (indentation, white space, braces, blank lines) and use booleans for every setting.
escodegen.generate(AST, { format: {
blankLines : {
FunctionDeclaration_before : true,
FunctionDeclaration_after : true
},
indent : {
// probably just block expressions require an "indent" option
FunctionDeclaration : true,
},
whiteSpace : {
CommaObjectExpression_before : false,
CommaObjectExpression_after : true,
// things like LogicalExpression would set the default for the whole category
LogicalExpression_before: true,
LogicalExpression_after: true,
// and we could overwrite a single logical operator
LogicalOR_before : false,
LogicalOR_after : false
}
} });
Maybe we could even support both formats (mixed at the same time) - we read the "shallow" options first than do a loop in the indent
, whiteSpace
and blankLines
objects filling the gaps - if we follow the loose format it's easy to convert the booleans into the proper tokens (should take ~10LOC).
The "loose" format would enable more flexibility since we could even come up with special tokens for things like indent (eg. \n${%i-2}
, which would mean line break + indent minus 2 chars), this could be the solution for things like weird comma-first styling (example) - note that return statement would be an issue if the user sets a line break before arrays and objects (behavior on var declaration is different than return statement).
@millermedeiros: If option 1 was taken, I would prefer an object like {format: {before: { ... }, after: { ... }}}
. But I like the mixed idea. An example:
escodegen.generate(AST, { format: {
indent : {
style: " ",
base: 0, // equivalent to String.repeat(indent.style, ~~0) under `indent`
FunctionDeclaration: true, // equivalent to String.repeat(indent.style, ~~true) under `indent`
Property: escodegen.specialFormats.COMMA_FIRST, // some constants for special styles
SwitchCase: -1 // possibly even dedents through negative numbers
},
whitespace: {
before: {
CommaObjectExpression : false, // equivalent to "" under `whitespace`
LogicalExpression: true, // equivalent to " " under `whitespace`
FunctionDeclaration: "\n", // custom values
LogicalOR : false,
},
after: {
CommaObjectExpression : true,
LogicalExpression: true,
FunctionDeclaration: "\n",
LogicalOR : false
}
}
}});
edit: made some minor additions, fixed a bug
I would prefer object style too. Mixed style looks good. Scriptable style, something like callback function style, is future work.
sounds like a plan! I think we should probably start with the basics (braces, operators) and then expand to the others, that way we can validate the first ones and make sure we are following the right approach and that it's "scalable".
I'm thinking about a new method similar to addIndent()
and parenthesize()
that accepts the content and type, like:
function addWhitespace(content, type) {
return _whitespaceBefore[type] + content + _whitespaceAfter[type];
}
Splitting the before/after into individual objects is probably a good idea, code will be cleaner and configuration can also be easier. I was keeping it as a shallow object with _before
/_after
to be able to group things by proximity (imagine a configuration file with all the options - probably 80+ lines) but a quick search is more than enough and will end up being more organized anyway, so my vote is to use @michaelficarra model and skip constants and special indentation for now.
I think we can start a new test suite just for the indentation and add individual tests as we add new configuration options, no need for a file with all the options for now, specially since we just need to test the AST to JS conversion. I would probably create a file with the source file, one with the AST (create a script to generate it automatically) and one with the expected output, that way it will be easier to compare it and add new tests (like yuicompressor does with css).
Initially, I created format.indent section and move indent / base to it. https://github.com/Constellation/escodegen/issues/6
@millermedeiros
YUICompressor test YUI Compressor Test system looks very good! So I'll created new issue for it!
anyone working on this?
@goatslacker: I will be starting in around a month, possibly sooner.
edit: I am hoping to base my work on this: http://oai.cwi.nl/oai/asset/10876/10876D.pdf
Peachy. Looking forward to it. Is this part of your redux project?
It would be reasonable if the options in question would not take precedence over options.format.compact
.
Is anyone currently working on this? @michaelficarra?
Some formating options are now available.
@Constellation I don't think it is working right. Here's an example:
dejavu.Class.declare({
method2: function () {}
});
Generated an ast and then passed to escodegen and here's the output:
dejavu.Class.declare({method2: function () {
}});
Ah right, agreed. I'll fix it, thanks!
@satazor: I've fixed it, thanks! 4e610a28fdfd0ec9f175a5f3f83189197e669cb5 c5129de8a89af0a9432327f38ae7725e2abd021f
@Constellation any chance to have this landed on npm?
@satazor: I'll publish it :)
edit: publised as 0.0.14
@Constellation There is still some strange things, but they are hard to explain. Can we talk via gtalk or something?
This week I did some work with AST and it made me think that escodegen is not going to be the best tool for code formatting. It doesn't use the source tokens (specially white spaces) and the changes will be destructive - since it is a full rewrite of the code (which can be a good thing in some cases).
I'm also feeling that this goes beyond the responsibility of escodegen. Simple settings are OK, especially the ones related to the code generation itself (asi, line breaks, indent) but I don't think it is worth increasing escodegen complexity too much. Would be better to create a separate tool that did the non-destructive formatting and bundle both tools if that makes sense, the same way that esmangle is a separate project.
AST => escodegen => string => esformatter => string
What do you guys think?
PS: I might start such project (non-destructive formatter), just need to find some time.
@millermedeiros: I have no idea what point you are trying to make.
@michaelficarra my point is that my original feature request of adding granular control about the code output style isn't really a good idea. It would be better to focus the effort on other things and create a separate tool/lib to do the code formatting, that way escodegen is responsible only for generating the basic structure.
I disagree. This is exactly the place for it. What would this other tool do? Use esprima to parse escodegen output to the exact IR escodegen operates on and then do exactly what you're suggesting here? That would make no sense.
@satazor Sorry for very late reply. Is it possible to paste some examples? Probably, it is easy to understand problem :)
@millermedeiros, @michaelficarra
Hm, structured data information (such as AST) isn't necessary for esformatter
? Because escodegen generates raw text data, so formatter needs to treat raw text. If structured data information is needed, I think this job belongs to escodegen.
@Constellation yes.
Original code: https://github.com/IndigoUnited/dejavu/blob/master/test/specs/functional.js Code outputed after some transformations: https://github.com/IndigoUnited/dejavu/blob/master/test/specs/functional_optimized.js
As you can see, escodegen does not preserve new lines, semi-colons and other things correctly.
@Constellation a tool like esformatter
would need an AST as well but the most important would be the raw string input (since it contain information about line breaks, original tokens and white spaces). I think the formatter should be able to apply the transformations without compromising the source (non-destructive changes), if the user have some missing semicolons it shouldn't automatically add/remove them (unless explicitly set), the same thing for empty lines, trailing white spaces, etc. The AST generated by Esprima
doesn't contain info about white spaces and escodegen
doesn't use the source tokens to generate the output. - The output is probably going to be very different from the input in many cases.
I just wanted to let you guys know that while having the settings would be something good it won't solve all needs since in some cases the user might want to apply only non-destructive changes and that having so many granular settings might be out of the scope of escodegen
and make the code more complex than it need to be.
I started the work on esformatter
and just pushed it to github and npm: https://github.com/millermedeiros/esformatter - I also wrote a separate lib (rocambole) to do the recursive walk and generate the extra tokens for comments, line breaks and white spaces. Trying to make it as less destructive as possible.
All the tests inside test/compare/default are working as expected but I'm still missing support for many important statements like if/else
, while
, for
, etc...
Maybe now it will make sense for you guys why the non-destructive approach is actually better for code formatting since it will avoid inserting any token that isn't a white space or line break (if the user wrote the code that way he probably have a reason to). I plan to add automatic semicolon insertion and also add options to toggle line wrap and if it should keep existing line breaks and white spaces, but of course this will be future work, focusing on the basics for now.
Contributors are welcome. Cheers.
Any update on the suggestions of @millermedeiros yet? Is there any work going on in this area? If not, is it still of interest that something is going to happen there?
@goloroden esformatter is getting closer to being "usable", only problem is that I always get busy with other projects. I would not recommend adding too many settings to escodegen, complexity reaaallly piles up.
I would like to have the code formatted with leading commas. Such as:
var a = ({
a: "Prefixed with tab then space"
,b: "Prefixed with tab then comma"
,c: "Prefixed with tab then comma"
});
I got to this project while discussing ways to do advanced code formatting for JS (source)
It would be great if it had the same level of control as FDT advanced code formatter - options to define how white space is handled between each kind of token (line break, space, none), indentation, keep/remove empty lines, etc.
Basically transform escodegen into a hardcore version of JSBeautifier.