dflook / python-minifier

Transform Python source code into its most compact representation
MIT License
558 stars 41 forks source link

optimizations #5

Open projectgroepsteenplaat opened 4 years ago

projectgroepsteenplaat commented 4 years ago
dflook commented 4 years ago

Hi, thanks for taking the time to create this issue!

I realise I haven't written the goals for this project anywhere, so here they are:

  1. With no transforms enabled, the output should produce exactly the same program as the input when parsed (an identical AST).
  2. Transformations should take sensible, well written code and produce code of the same or shorter length with the same behaviour. The existing transformations do things that would be difficult, tedious or just look strange to a programmer.
  3. Never produce broken code.
  4. Works for supported versions of cpython and pypy.

Non-Goals:

  1. Minifier performance.
  2. Transformations that are targeted at badly written code.
  3. Obfuscation.

The following suggestions I feel are out of scope, as the code can be minified by just writing better code:

  • Repeating the same command a lot of times in succession can be minified to a for loop with the command in it.
  • The print command will convert an integer to a string so print(str(2)) can become print(2)
  • str(1) can become '1' and str(2) can become '2' and so on
  • if 1 == 2: print("Hello, how are you?") can be removed completely as 1 does never equal 2
  • print(1+1) can become print(2)
  • a=1;a=2 can be changed into a=2 as that will be the final value anyway
  • for i in range(10):pass becomes for i in range(10):0, but it can be completely ignored

I appreciate you can't always change the input code but it's a lot of effort to add these. Without doing a level of dynamic analysis which is difficult or impossible, it would only work for relatively trivial examples.

For the following suggestions, I thought python-minifier already produced optimal code - have you got any examples where that's not the case?

  • Indentation can be minified
  • Sometimes it transforms a literal into a variable when the literal is only used once and it becomes less minified
  • import decimal can become import decimal as d, this will be more bytes for the import, but will be less for every call to the decimal module

These are good ideas! I think these can be in the next version:

  • Large numbers will be converted to hex, but for example 1000000000000 could be 10**12 but instead I get 0xe8d4a51000, which is shorter than the original, but could be smaller
  • pow(a,b) can become a**b as that is shorter
  • pow(a,b,c) can become a**b%c, but this is slower so it may should be optional

This is a good idea, but more difficult:

  • print("Hello, how are you?");print("Hello, how are you now?") can become a="Hello, how are you";print(a+"?");print(a+" now?")
projectgroepsteenplaat commented 4 years ago

edit: indentation at the way I did it becomes invisible so I added indent for indentation

x = input() if x == "hi": indent>for i in range(2): indent>indent>pass indent>print("Hello") print("Hello, how are you?")

becomes: x=input() if x=='hi': indent>for i in range(2):0 indent>print('Hello') print('Hello, how are you?')

could be: x=input() if x=='hi': space>for i in range(2):0 space>print('Hello') print('Hello, how are you?') or even: x=input() if x=='hi':print('Hello') print('Hello, how are you?')

And this: import decimal print(decimal.Decimal(4)) print(decimal.Decimal(7)) print(decimal.Decimal(9))

Remains the same but could be: import decimal as d print(d.Decimal(4)) print(d.Decimal(7)) print(d.Decimal(9))

Which is +4 bytes for import and -6 bytes for every print, and thus -18 is an improvement of 14 bytes

For the literal that is only used once I can't find an example anymore but I have had it once.

Kind regards

dflook commented 4 years ago

Hi, thanks again for creating those examples!

Bear in mind that after minification indents become tab characters, which are a single byte long - the same as spaces. I use tabs because the output is more readable.

In your example we can eliminate the for loop entirely, but that's because we know there are no side effects to the range() call. Determining that in the general case is a difficult problem. We could specifically recognise some simple cases, but it's not a feature I plan to add.

In the decimal example I think you have rename-globals disabled, which prevents that transformation because it introduces the new name d, which may not be desirable. If I enable rename-globals, I actually get

B=print
import decimal as A
B(A.Decimal(4))
B(A.Decimal(7))
B(A.Decimal(9))
projectgroepsteenplaat commented 4 years ago

Numbers greater than 10 ^ 45 get minified with hex, but would be shorter in base 36 (int(string, 36) will convert back to decimal)

amaank404 commented 3 years ago

the one that shows conversion of 1000000000000 into 10**12 instead of 0xe8d4a51000 might not be a great idea considering the performance penelty of multiplying the number 12 times at runtime. this might even become a problem for performance requirements of some projects. How about make that optional. I have worked with that power operator and it is slow sometimes.

mqyhlkahu commented 1 year ago

@amaank404

might not be a great idea considering the performance penelty of multiplying the number 12 times at runtime. this might even become a problem for performance

Because both 10 and 12 are constant values, the result of 10**12 is calculated at compile time, before the code is run.

import dis
dis.dis("10 ** 12")
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (1000000000000)
              4 RETURN_VALUE
amaank404 commented 1 year ago

I understand that now, I believe I was still new to this field back then. Thanks for knowledge!