Open bitwiseman opened 11 years ago
js-beatifier processes a provided string into a formatted javascript string. It does this without fully parsing, instead opting for a set of best guesses about the syntax only in so much as it effects the formating. That incomplete parse makes for inaccuracies - bugs.
If we fully parsed the provided javascript, then ran a set of well-understood rules over that output, we would have fewer of these edge-case bugs like #56. Assuming the parser handled invalid javascript and recovered, we could then skip formatting on sections that were invalid.
On the other hand, the incomplete parsing make the code able to run fast and reasonably well, even on invalid javascript. Also, that is how js-beautify is implemented (in more than one language, at that).
Parsing will certainly become more complete over time, as specific bugs are reported and fixed, but it would be a huge undertaking to change this design choice within the current code base, basically a complete rewrite.
On the bright side, we do have a fairly good test base.
Having delved into the uglifyjs parser, I am strongly in favour of retaining the current string-based token parsing instead of adopting an actual JS parser. It is stupidly complex, and basically unportable (in a maintainable, recognizable way) to Python.
@evocateur Just out of curiosity, are you talking about the uglifyjs parser, the uglifyjs2 parser, or both?
Both, actually. Uglify v1 was terribly disorganized, and in that sense v2 is much more readable. It uses some really snappy code generation algorithms, at least as far as I can tell. I'm still pretty poor at understanding AST, but I'm getting better.
I actually spent some time a few months back trying to make Uglify2's "beautify" mode work to my preferences, and I got a tiny bit done, but nothing shippable. I ended up giving up the idea because js-beautify already handles 99% of what I wanted. There wasn't enough flexibility in Uglify2's API at the time, anyway, to justify a whole bunch of effort to get 1% of benefit.
To the point of this thread, I'm more concerned about API portability between Python and JS (if indeed that is to remain a priority for the project). Parsing JS, to me, represents an incompatible step away from the Python script, because it will make some very difficult-to-port AST modification logic (if indeed Python can somehow parse JS itself already).
So, what you're saying is Uglify (in fact, any full-fledged js parser) treats the js AST first and foremost, a set of tokens second, and formatting third (if at all). Beautifiers based on that model will, quite reasonably, walk the AST and perform modifications on the AST, but that is often overkill in terms of producing formatted code.
You're not opposed to building up some sort of tree tokens as long as it serves the primary features of js-beautifer.
Correct, I have nothing against ASTs, just a little hesitation on creating an undue maintenance burden between the implementations.
On the other hand, a quick google yielded Python projects like https://github.com/rspivak/slimit that provide JS lexing and parsing, resulting in an AST that could be operated on in the same fashion as the JS AST (provided by, say, uglify). Insofar as our implementations are as identical as possible, this seems like a reasonable compromise.
However, we still don't get JS parsing even with the various Python libraries, so the ability to catch syntax errors in a trivial way (instead of insanely complex AST manipulations) is lost. I'm not exactly broken up about that, as other tools exist (JSHint) to ensure those sorts of errors don't occur.
:+1: I would love to see that issue fixed...
This is the source of a lot of bugs. Also, it is feature.