beautifier / js-beautify

Beautifier for javascript
https://beautifier.io
MIT License
8.62k stars 1.39k forks source link

Javascript not fully parsed #200

Open bitwiseman opened 11 years ago

bitwiseman commented 11 years ago

This is the source of a lot of bugs. Also, it is feature.

bitwiseman commented 11 years ago

js-beatifier processes a provided string into a formatted javascript string. It does this without fully parsing, instead opting for a set of best guesses about the syntax only in so much as it effects the formating. That incomplete parse makes for inaccuracies - bugs.

If we fully parsed the provided javascript, then ran a set of well-understood rules over that output, we would have fewer of these edge-case bugs like #56. Assuming the parser handled invalid javascript and recovered, we could then skip formatting on sections that were invalid.

On the other hand, the incomplete parsing make the code able to run fast and reasonably well, even on invalid javascript. Also, that is how js-beautify is implemented (in more than one language, at that).

Parsing will certainly become more complete over time, as specific bugs are reported and fixed, but it would be a huge undertaking to change this design choice within the current code base, basically a complete rewrite.

einars commented 11 years ago

On the bright side, we do have a fairly good test base.

evocateur commented 11 years ago

Having delved into the uglifyjs parser, I am strongly in favour of retaining the current string-based token parsing instead of adopting an actual JS parser. It is stupidly complex, and basically unportable (in a maintainable, recognizable way) to Python.

bitwiseman commented 11 years ago

@evocateur Just out of curiosity, are you talking about the uglifyjs parser, the uglifyjs2 parser, or both?

evocateur commented 11 years ago

Both, actually. Uglify v1 was terribly disorganized, and in that sense v2 is much more readable. It uses some really snappy code generation algorithms, at least as far as I can tell. I'm still pretty poor at understanding AST, but I'm getting better.

I actually spent some time a few months back trying to make Uglify2's "beautify" mode work to my preferences, and I got a tiny bit done, but nothing shippable. I ended up giving up the idea because js-beautify already handles 99% of what I wanted. There wasn't enough flexibility in Uglify2's API at the time, anyway, to justify a whole bunch of effort to get 1% of benefit.

To the point of this thread, I'm more concerned about API portability between Python and JS (if indeed that is to remain a priority for the project). Parsing JS, to me, represents an incompatible step away from the Python script, because it will make some very difficult-to-port AST modification logic (if indeed Python can somehow parse JS itself already).

bitwiseman commented 11 years ago

So, what you're saying is Uglify (in fact, any full-fledged js parser) treats the js AST first and foremost, a set of tokens second, and formatting third (if at all). Beautifiers based on that model will, quite reasonably, walk the AST and perform modifications on the AST, but that is often overkill in terms of producing formatted code.

You're not opposed to building up some sort of tree tokens as long as it serves the primary features of js-beautifer.

evocateur commented 11 years ago

Correct, I have nothing against ASTs, just a little hesitation on creating an undue maintenance burden between the implementations.

On the other hand, a quick google yielded Python projects like https://github.com/rspivak/slimit that provide JS lexing and parsing, resulting in an AST that could be operated on in the same fashion as the JS AST (provided by, say, uglify). Insofar as our implementations are as identical as possible, this seems like a reasonable compromise.

However, we still don't get JS parsing even with the various Python libraries, so the ability to catch syntax errors in a trivial way (instead of insanely complex AST manipulations) is lost. I'm not exactly broken up about that, as other tools exist (JSHint) to ensure those sorts of errors don't occur.

rmariuzzo commented 10 years ago

:+1: I would love to see that issue fixed...