Closed GoogleCodeExporter closed 9 years ago
Original comment by acle...@gmail.com
on 1 Apr 2013 at 8:13
From your example, it seems that in the first pass we choose globally unique
variable names for everything except formal parameters of functions.
It might be better to have the pass work on each individual scope separately.
The signature of the renaming function would be sth like:
void renameVars(Node n, Set<String> inUse);
where n is the root node of a scope, and inUse is a set of variable names we
aren't allowed to use in that scope. Any other name is fair game.
This way, when we call renameVars for y or z, we know that o and p don't
interfere with the choice of names, and when we call renameVars for yy we know
that yy references o and p so we can't shadow them.
Original comment by dim...@google.com
on 1 Apr 2013 at 9:08
As I understand it, the real problem is that "$" gets L1 in one local scope but
L2 in another local scope. Right? Could we just fix this by having a better
algorithm for assigning L variable IDs?
Original comment by Nicholas.J.Santos
on 1 Apr 2013 at 9:22
As in assign the L variable ids by local scope frequency rather than logical
order?
Original comment by chadkill...@missouristate.edu
on 1 Apr 2013 at 9:34
@dimvar:
That's possible. The problem is that right now we do everything in a single
order. How do we compute 'inUse'? It is a trade-off problem. You can
potentially preserve more variables in the 'inUse' to save size but it might
hurt that really big function.
To illustrate this trade-off more clearly. Lets go back to my example:
Suppose in function yy: We have an algorithm that detects the shadowing
conflict and wants to not use L_8 and L_9 for parameter. At this point, we have
no idea if it is worth it. Yes, lowering total number of variable names by
shadowing is good but by doing that you have one less function that have a
substring starting with "function(a,b". Will it be good? The only way to find
out is to run gzip on it.
Hehe. If I have the time and the skill in computational complexity, I think we
can prove that renaming vars for optimal gzip size is pretty much NP-Complete.
@Nick:
I am not sure. I am mostly unfamiliar with JQuery. Where is $ defined? Can you
explain why you think $ is the problem? When I look at JQuery 1.9, I see uglify
able to use 1 char names in lots of occasions.
While I have your attention, do you remember this patch you sent out in 2011
that address this?
Original comment by acle...@gmail.com
on 1 Apr 2013 at 10:00
@Alan: "$" is only defined in jQuery near the end of the file during the
exports (window.jQuery = window.$ = jQuery). Internally, references all use the
"jQuery" name not the "$" alias. As far as I know this is true for quite a few
jQuery versions (at least as far back as 1.7 - probably farther).
I thought of what changed though. Prior to 1.9, each file (module) had it's own
function wrapper. In jQuery 1.9 everything was merged into one non-conflicting
namespace and a single wrapper function surrounds the entire code base.
Original comment by chadkill...@missouristate.edu
on 1 Apr 2013 at 10:08
@Alan:
> Yes, lowering total number of variable names by shadowing is good but by
doing that you have one less function that have a substring starting with
"function(a,b". Will it be good?
That's not necessary. We know which variable names are used in the whole
program before renaming. If we pick names for formal parameters that don't
appear in the program, all functions will start with the same substring and
we'll still have maximum shadowing.
Original comment by dim...@google.com
on 1 Apr 2013 at 10:16
Actually, sorry, it's not possible that all functions start with the same
prefix, eg,
function f(x) {
...
function g(y) {
... references x ...
}
...
}
Clearly, we can't use the same variable for x and y.
But we can use the same names for all functions that at the same nesting level
at least.
Original comment by dim...@google.com
on 1 Apr 2013 at 10:21
@dimvar:
The problem still lingers in the same nesting level. Same example:
var o, p;
x = function (a,b,c) {}
y = function (m,n) {}
yy = function (x,y) {alert(o); alert(p)}
Suppose you want to shadow o p: (code #1)
var o, p;
x = function (o,p,c) {}
y = function (o,p) {}
yy = function (x,y) {alert(o);alert(p)} <---we can't use o, p here. Therefore we needed up with a function that isn't function(o,p..
You can argue that instead of renaming the parameters, we can just rename, o
and p?
var a, b;
x = function (a,b,c) {}
y = function (a,b) {}
yy = function (a,b) {alert(a); alert(b)} <--- this is wrong, there is no way to access the original o and p now.
At the end of the day, the only correct naming scheme are (code #1) and this
(code #2)
var o, p;
x = function (a,b,c) {}
y = function (a,b) {}
yy = function (a,b) {alert(o);alert(p)}
Again, we don't know which if #1 or #2 is better until we gzip it.
Original comment by acle...@gmail.com
on 1 Apr 2013 at 10:47
And for comments about just a smarter L_# renaming algorithm:
It might not be as simple neither.
consider this:
function() { ....... var x; var y;.... }
function() { ....... var m; .... }
function() { ........var n; }
where x, y are used a lot, we give it L_1 and L_2.
Now when we are at 'm' (another frequently used local), do we give it L_1 or
L_2? It seems like a greedy algorithm mapping it to L_1 make sense, right?
That's not always the case. Gzip can use dynamic huffman encoding. If there are
no more L_1's after the first function and the total count of y, m and n are
greater than the count of x. It is actually better to use L_2 since gzip can
use a new encoding that favors L_2 after the first function. (doesn't this
sound like a http://en.wikipedia.org/wiki/Knapsack_problem?)
Now add back shadowing and the "function(a,b,c" problem....
-What if the 2nd function is actually function(m){? Do we give up a
"function(a" substring?
-What if not naming 'n' to L_1 or L_2 enables an extra shadowing? Would that be
worth it?
What a mess! :)
Original comment by acle...@gmail.com
on 1 Apr 2013 at 11:35
can somebody post repro steps of the actual bug being reported, for those of us
who missed the meeting today? is it just closure-compiler simple mode vs.
uglify on jquery 1.9.1?
Original comment by Nicholas.J.Santos
on 2 Apr 2013 at 4:45
Simple mode vs uglify on JQuery 1.9.1
Original comment by acle...@gmail.com
on 2 Apr 2013 at 9:02
Thinking of this overnight, I'm wondering how the following renaming heuristic
would compare to the current one - it's based on function scope depth:
function global() {
function foo1(arg1, arg2, arg3) {
function bar(x, y ,z) {
var barlocal1;
}
var foolocal1;
}
function foo2() {
function bar2(x, y ,z) {
var bar2local1;
}
var foo2local1;
}
var global_local1;
}
Intermediate renaming step 1 - rename function args based on scope depth:
function global() {
function foo1(D1_1, D1_2, D1_3) {
function bar(D2_1, D2_2, D2_3) {
var barlocal1;
}
var foolocal1;
}
function foo2() {
function bar2(D2_1, D2_2, D2_3) {
var bar2local1;
}
var foo2local1;
}
var global_local1;
}
Intermediate renaming step 2 - rename local variables based on frequency,
reusing the best function arg name available:
function global() {
function foo1(D1_1, D1_2, D1_3) {
function bar(D2_1, D2_2, D2_3) {
var L3;
}
var L2;
}
function foo2() {
function bar2(D2_1, D2_2, D2_3) {
var D1_2;
}
var D1_1;
}
var L1;
}
In this way all functions with the same scope depth will have the same
signature pattern.
@Nick - here's the current gzip results after compilation:
33641 - closure compiler
32769 - uglifyjs
Original comment by chadkill...@missouristate.edu
on 2 Apr 2013 at 1:24
@chad:
The truth is, I don't know. :) It goes back to my point that we can probably
never find the most optimal algorithm.
Now for your example, how are you doing step 2? How are the new names for
bar2local1 and fo2local1 selected?
Original comment by acle...@gmail.com
on 2 Apr 2013 at 5:05
@alan
I assumed at that at the second step, you were traversing the tree again and
that you knew from the previous traversal which function arg names were used
and at what frequency. The example isn't complete, but the assumption was that
D1_1 would be referenced very frequently because every function at scope depth
1 with at least one argument would have such a reference. In that case, I knew
in the current branch of the traversal that D1_1 was not in use (no arguments
to the function) and so assigned the local variable that name.
I came at this idea by trying to look at the problem this way:
1. function signatures will always be longer than local variable names and so
matches here will produce much better gzip results. Functions are also used a
lot. Prioritize matching function signatures above local variable names.
2. functions at the same scope depth will never have conflicting arguments.
3. In practice, I'm guessing that function nesting depth is pretty flat.
Greater than 3 seems unusual. Greater than 5? (lab/generated code excepted).
I agree with you that there is no such thing as the most optimal solution here.
I'm just looking for better :-)
Original comment by chadkill...@missouristate.edu
on 2 Apr 2013 at 6:40
@chad:
The L_# renaming step should be very trivial Java code if you are interested in
messing around with it.
I am trying to make sense out of one of your previous comments:
"In jQuery 1.9 everything was merged into one non-conflicting namespace and a
single wrapper function surrounds the entire code base."
Does this imply the wrapper function holds lots of variables or everything is
just one variable serving as the namespace?
Original comment by acle...@gmail.com
on 3 Apr 2013 at 5:40
@alan The wrapper now holds a lot of variables. The jQuery team worked to make
sure differing files used uniquely named variables so that when the files were
appended, no conflicts occurred.
--In 1.7--
File1:
var jQuery = (function(window, undefined) {
var local1 = "";
var local2 = "";
var jQuery = function() {};
return jQuery;
})(window)
File2:
(function(window, undefined) {
var local1 = "";
var local2 = "";
jQuery.prototype.prop = function() {};
})(window)
...
--In 1.9--
File1:
var file1_local1 = "";
var file1_local2 = "";
var jQuery = function() {};
File2:
var file2_local1 = "";
var file2_local2 = "";
jQuery.prototype.prop = function() {};
window.jQuery = jQuery;
Final output (wrapper added as build step):
(function(window, undefined) {
var file1_local1 = "";
var file1_local2 = "";
var jQuery = function() {};
var file2_local1 = "";
var file2_local2 = "";
jQuery.prototype.prop = function() {};
window.jQuery = jQuery;
})(window)
Original comment by chadkill...@missouristate.edu
on 3 Apr 2013 at 6:07
i think this might be a red herring. I ran some tests with jquery 1.9.1. All
results are gzipped:
uglify: 33079
closure-compiler: 33551
uglify (no variable renaming): 40782
closure-compiler (no variable renaming): 41214
I think the big differences are in how we choose names, and in how we choose
line breaks. For names, we choose "a", then "b", then "c", and so on. Uglify
tries to use commonly-used characters first. I think we've often talked about
doing this, and you save some bytes this way. Removing line breaks is a bigger
win.
If I do those 2 things, closure-compiler comes down to 33167.
All of that said, I've never really been interested in competing with YUI or
Uglify. Small improvements on "simple" JS compression don't really help people
much. I'll let you all decide how much more time/research to allocate to this.
Original comment by Nicholas.J.Santos
on 6 Apr 2013 at 6:54
Alan is having fun
Original comment by johnl...@google.com
on 6 Apr 2013 at 5:25
Yep. Pretty much. Just scratching an itch.....
Original comment by acle...@gmail.com
on 9 Apr 2013 at 12:57
I'm working on the heuristic I mentioned above. Once I get it to a somewhat
functional state, I'll let you know how it performs I my code. If it does
better, someone can test it on Google code.
Original comment by chadkill...@missouristate.edu
on 9 Apr 2013 at 12:30
Sure. I have been testing out a few heuristics when I have some free cycles as
well.
My approaches has been trying to get as many variables to fall into rule #2
while completely following rule #1 of my first post.
Original comment by acle...@gmail.com
on 9 Apr 2013 at 5:47
Wow. This is a meta issue on lots of renaming issues. It is hard to keep track
of the conversation.
I can now confirmed that is *not* uglify using commonly used chars first that's
the issue. As I stated before, Closure Compiler code was a lot bigger even
before gzipping. It is the underlying renaming algorithm.
Original comment by acle...@gmail.com
on 11 Apr 2013 at 9:27
Original comment by concavel...@gmail.com
on 15 Apr 2013 at 8:02
It is amazing that on large code bases a slightly 'smarter' name generation has
impact in the ~20 *bytes* range.
Original comment by acle...@google.com
on 26 Apr 2013 at 7:43
Issue tracking has been migrated to github. Please make any further comments on
this issue through https://github.com/google/closure-compiler/issues
Original comment by blic...@google.com
on 1 May 2014 at 6:31
Original comment by blic...@google.com
on 1 May 2014 at 6:34
Original issue reported on code.google.com by
acle...@gmail.com
on 1 Apr 2013 at 8:07