blakeembrey / javascript-stringify

Stringify is to `eval` as `JSON.stringify` is to `JSON.parse`
MIT License
139 stars 16 forks source link

Extensive rewrite of function stringification #22

Closed rhendric closed 5 years ago

rhendric commented 5 years ago

Node.js 10 changed how the toString method on functions works. The resulting string is much less ambiguous than what the method produces on older versions, but also requires a more robust parser to interpret correctly. Stripping off computed property prefixes necessitates the ability to find the matching ']' to a given '[', and doing that correctly means taking into account all the ways that comments, template strings, and regular expression literals can fool a naive implementation. A full JavaScript parser would be many times the amount of code added here; this "parser-lite" implementation cuts many corners but is possibly bug-free on the most recent version of Node.

The toString implementation of previous versions of Node continues to be supported. On Node.js 4, this logic is also possibly bug-free. However, a behavior change that I believe started in Node.js 6 means that versions of Node.js greater than 4 and less than 10 can toString a small subset of functions in such a way that it is impossible to unambiguously stringify them in all circumstances. Still, we do the best we can. Tests that are conditionally skipped on the affected Node.js versions demonstrate the problem.

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-1.5%) to 98.502% when pulling ccb5221830e70cd05672769bd676d3462eaf215e on rhendric:node10-functions into f06123edbdd49fc2dccd9e667a8478480e7c766b on blakeembrey:master.

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling e70a84d877de56dc1842475d98ee7ccc118e83b8 on rhendric:node10-functions into f06123edbdd49fc2dccd9e667a8478480e7c766b on blakeembrey:master.

rhendric commented 5 years ago

Replying from #16:

It sounds like the best approach is to substring the prefix matching against fn.name, "${fn.name}" and '${fn.name}' (similar to before but with two quoted additions).

That won't work because there's more than one way to represent a string. Consider:

({'\x41\x42'() {}}).AB.toString(); // => "'\\x41\\x42'() {}"

If possible, it's better to parse than to assume we know how the string will be presented.

We'll always need to substring and change the name to the computed key in object definition or regular function definition elsewhere - there doesn't seem to be any way we use fn.name for method definition syntax in javascript-stringify.

For node.js <10, using fn.name is still a pretty good heuristic. For versions of node.js >4, you run into the problem that functions can pick up names from object keys that don't appear in their toStrings—see this test. Node.js 4 doesn't do this, so I think perfect stringification is still possible on that platform.

I haven't quite grokked the difference between the two o results and why they have the same fn.name yet. Seems like it might be a bug in the updated toString() output.

It isn't a bug. I do like how the updated toString is much less ambiguous than before; we don't have to guess around whether some weird sequence of symbols is the start of a function or just its name. Instead, we just parse the first variable, string, or [/]-delimited expression, skipping comments and whitespace. (Easier said than done, with all the ways JavaScript syntax can trick you, but at least it's deterministic.) Unfortunately, we need a way to know whether we can unambiguously just parse some syntax and get good results, or whether we have to guess around with the name-stripping approach—on earlier versions of node, if we parse first, we could get a false positive. (See this test, for example.) The pair of functions I gave in #16 take advantage of this meta-ambiguity—they are being correctly toString'd according to the rules of their respective platforms, but if you don't know which platform you're on, you can't know which function you have.

blakeembrey commented 5 years ago

@rhendric This is an insane PR, thanks for the contribution! I hope you don't mind if I take a bit of time and tweak it a little. I might release this with a larger rewrite so this logic can belong in its own file.

rhendric commented 5 years ago

I don't mind that at all; I'm just relieved your reaction isn't that this adds way too much complexity to be justified by handling every possible input. Let me know if there's anything I can do to help; I'm happy to answer any questions about decisions that aren't explained by comments, or rewrite the PR to conform to any additional style constraints that I didn't pick up from the existing code.

If you are holding off until a larger rewrite to push this, and if that rewrite includes updating the whole library to ES6 syntax, that would solve one unsolved problem: I did settle on using ES6 to define METHOD_NAMES_ARE_QUOTED, rather than an eval or explicit version sniffing.