dflook / python-minifier

Transform Python source code into its most compact representation
MIT License
558 stars 41 forks source link

Duplicate parameter names generated with python 2.7 #10

Closed jplloyd closed 4 years ago

jplloyd commented 4 years ago

Hi, very nice library/application! I'm using it to reduce the size of MyPaint's appimage files (by a fair chunk actually).

Prelude

Version used: 2.3.0

When pre-minifying the libraries I ran into an issue with files that had already been minified before, where pyminify produced invalid parameters. As far as I can tell, this only happens when running with python 2.7, but not 3.5+ (tested with 2.7.5 and 2.7.12).

Problem

The same name is assigned to the *args and the **kwds parameters, yielding invalid code.

Minimal example

Input:

class Q:
    def f(C,a,*A,**B):D='.';super(Q,C).f(a,*A,**B);D or D

when minified, yields the invalid output:

class Q:
    def f(D,a,*C,**C):A='.';super(Q,D).f(a,*B,**C);A or A

where *C really should be *B (classic off by one, but due to some py2/3 semantic difference?)

The minimal example is itself a (valid) minification of:

class Q:

    def f(self, a, *b, **c):
        super(Q, self).f(a, *b, **c)
        '.' or '.'
dflook commented 4 years ago

Oh no! Thanks for the report, I'll look into this as soon as I can.

dflook commented 4 years ago

I've identified the bug that causes this.

Python Minifier uses the parser built into the python interpreter to get an Abstract Syntax Tree representing the program to be minifed. The nodes in this tree are objects representing individual elements of your code.

We walk through this tree to link together nodes that refer to the same name. For example, the FunctionDef('f') node gets linked to the Name('f') node where we call the function. When renaming, all these linked nodes get updated at the same time.

In Python versions before 3.4, the *args and **kwargs names are both represented by a single arguments node. The bug is in not being careful enough when renaming *args and **kwargs.

If the new name for *args happened to be the same as the old name for **kwargs, renaming **kwargs could incorrectly also rename *args (or the other way around). This is extremely unlikely when using the conventional args and kwargs but fairly likely when the names were sequentially assigned, such as when the input has already been minified.

The output python_minifer was trying to generate was:

class Q:
    def f(D,a,*B,**C):A='.';super(Q,D).f(a,*B,**C);A or A

The test suite takes a really long time, so the fix will probably not be released for a few days. It's on the branch fix-duplicate-args-kwargs if you need it before then.

The last thing I had to investigate was why the first run of python-minifer was generating the apparently sub-optimal:

class Q:
    def f(C,a,*A,**B):D='.';super(Q,C).f(a,*A,**B);D or D

Although there is no difference in output size in this case, the names with the most references should in general be assigned names earlier in the sequence of valid names.

The name D is being introduced to replace the literal '.' which is used twice so is considered to have two references. Of course it will actually have three, since it needs to be referenced in the new assignment. The second run of python_minifier can count the references correctly as the assignment already exists.

As this doesn't produce broken code, I'll consider this for a future release.

jplloyd commented 4 years ago

Thank you for resolving this so quickly! I circumvented the issue in my own use case by grepping for common patterns emerging from the minification and excluding matching files from being re-minified, so there's no hurry for my sake. It'll be nice not having to do that next time, however.

As for the second issue, I would consider idempotence to be nice-to-have, but not a necessity. I did derive the minimal example from a class in binding.py, so one test case that would have caught this would be running the minifier on its own source, and repeating the operation a couple of times.

I assume you'll close this when the fix is merged. Thanks again!

dflook commented 4 years ago

Fixed in 2.3.1 🎉