dflook / python-minifier

Transform Python source code into its most compact representation
MIT License
553 stars 40 forks source link

[WIP] Remove parenthesis for single argument exceptions in `raise` statement #69

Open dosisod opened 1 year ago

dosisod commented 1 year ago

There is no semantic difference between raise X() and raise X, so for the few instances where you are raising an exception with no arguments you can save 2 bytes. The AST comparison is failing now due to the Call node being replaced with a Name node, so I don't know what the best way to go about fixing that is. I have tested this on Python 3.5-3.11, but not Python 2.

Also, one suggestion I have would be adding a --no-ast-verify flag to allow for forcing output of unstable minifications for debugging purposes, as opposed to using print(self.code) or something similar.

dflook commented 1 year ago

Hi @dosisod, thanks for working on this, this is a good idea. The same thing could be applied to explicit exception chaining e.g.

raise MyException() from CauseException()
raise MyException from CauseException

The ModulePrinter should be printing an exact representation of the AST. For changes like this we would need to add a new transformer that visits Raise nodes and replaces its exc Call node with a Name node.

dflook commented 1 year ago

I've been thinking about this, and I don't think we can say that both forms are equivalent.

Consider:

def create_exception():
    return Exception()

raise create_exception()

Transforming the final line to raise create_exception would not be correct. I don't think we can do this safely generally. Even if we track all names in the module that look like they are exception classes and only remove parenthesis from those, it wouldn't help if such exception classses are imported from elsewhere.

dosisod commented 1 year ago

Good catch! Perhaps we should only apply this to built-in exceptions? Of course someone could do:

Exception = lambda: __builtins__.Exception()

raise Exception()  # ok
raise Exception    # not ok

Would there by any way to detect this in the AST, that is, detect the difference between an Exception which has been reassigned vs one that comes directly from the builtins module?

dflook commented 1 year ago

Not from the raw AST, but it gets annotated by parts of the renamer.

After the resolve_names() step the root Module node has a bindings attribute which is a list of Binding objects, each representing a distinct name used in the global scope. The objects have a variety of subtypes, one of which is BuiltinBinding.

The references property of each BuiltinBinding is a list of the actual AST nodes that mention the builtin. The renamer also adds a parent attribute to every AST node to make traversal easier.

If one of those references is a Call node with no args and has a Raises node a parent - that would be a candidate for changing to a Name node (assuming it is actually an exception type).

There is a brain-dump of how the renamer works here