TinDQ / unpyc

Automatically exported from code.google.com/p/unpyc
0 stars 0 forks source link

As it stands July 21, 2001 this decompiler is horribly broke #10

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
My mission: To point out the issue and help fix it.

Note: I'm not invalidating anyone, this not an attempt to troll.
unpyc is free, and I'm not owed anything.
Like a lot of people comming here, we are looking for a free robust method to 
decompile Python files.

Fact:
This decompiler is very broke and is not fit for beyond the most primitive 
decompiling as it stands on July 31, 2011.

What steps will reproduce the problem?:
Attempted to decompile several Python 2.6a1 files.

What is the expected output? What do you see instead?:
Was expected that they decompile.
At best maybe 1 to 3 out of ten actually decompiled with out error.
Most more then 90% failed.

What version of the product are you using?:
The latest and then with the 2.7 patch applied

On what operating system?:
Windows XP 32bit.

Please provide any additional information below:

First Some background.  I found some Korean sources for a 2.3 update, and then 
I attempted to update it to 2.4.
Admittedly for me, my biggest problem is my lack of the Python language.

It seems to me the root problem is at least with this decompile paradigm is 
that you end up with blocks (groups of opcodes between if/while/for what ever 
loops and branch constructs).
Then to put the actual byte code back into Python code its looked at as 
groups/patterns.  In other words a a group of byte codes will form a construct.

To do this properly one must know the patterns. At least most of them to 
approach a near 100% accuracy.
On top of this, this can be different per Python version.
From my work and understanding the bulk of the work resides in "Parser.py".
It's apparent that not all patterns are covered.
Unexpected patterns cause the parser to break.

I believe John Aycock's original stuff covered most of this and public attempts 
at updating have not been as through enough.

Furthermore if you look at the pay decompile site "crazy-compilers" quote:
"* Our unit-tests include more than 3900 test-patterns. Each of it is 
successfully decompiled for all supported versions, both normal (.pyc) and 
optimized (.pyo) bytecode. This is a total of about 138,000 test-cases."
"* decompyles and successfully verifies 100% of the Python 1.5, 2.0, 2.1, 2.2, 
2.3, 2.4, 2.5 and 2.6 library"

Solution:
I don't know if there needs to be close to 4k test cases but I think this is 
the key thing to make this decompiler fit.
I believe it will require a similar methodology.

Many simple construct and complex real world samples need to be applied and 
patterns needed to be tweaked and added.
Prehaps broken down on a per Python version basis.
This should obviously be automated as much as possible to make it easier.

Original issue reported on code.google.com by macromon...@ymail.com on 1 Aug 2011 at 3:42

GoogleCodeExporter commented 9 years ago
Make that "As it stands July 21, 2011"
Sorry, got a little charged up about these things.

This does seem to be the most complete free version.
How ever many things are missing as I stated.

There is a 2.7 only version that appears to be yet a another fork of your code 
here or the original decompile (need to find that link). 
It appears to be mostly complete for 2.7.

One thing this 2.7 replaces is all those marshal C files with a single 
"marshal.py" making it portable and doing away with having to build those PYD 
DLLs et al.
Unfortunately it is hard coded with 2.7 changes like the opcode 
POP_JUMP_IF_FALSE. 

Incidentally I need one to decompile 2.6.4 PYCs.
I was able to fix some 2.6+ problems by copying over stuff over from that 
version.
Like the total absence of STORE_MAP handling.
It is still way off unfortunately.

It's too bad the 2.7 guy didn't get with you here to combine his changes.
Furthermore there seems to be three or four different versions of "decompile" 
under different names all broken or lacking at least up to 2.7.x to on degree 
or another.
And maybe nice to have a 3.x version that might need to be an entire different 
version.

I wonder.
Would it just be easier to make an entire new rewrite with clear goal to 
support from all versions from a bottom baseline (say 2.0) to 2.7.x, or just 
keep adding to one of the many different versions floating around?..

Original comment by macromon...@ymail.com on 27 Aug 2011 at 5:14

GoogleCodeExporter commented 9 years ago
The 2.7 only version 
https://github.com/gstarnberger/uncompyle

Original comment by macromon...@ymail.com on 27 Aug 2011 at 5:36

GoogleCodeExporter commented 9 years ago
Good, I'll try your fork later, wish it will work well!
Several months ago, I download unpyc here, it gave me surprise also 
disappointment.
Then I try to fix it.
I also write a 'marshal_26.py' to replace marshal_26.c, then inherit it and 
make a litter change for lower version. Remove all dis files.
But I could not solve the problem of some structures, such as many and/or, eg:
  if a and b and c or (d and e): pass
  f = a and b and c or (d and e)
Then I gave up.

Original comment by www.eh...@gmail.com on 12 Nov 2011 at 5:48

GoogleCodeExporter commented 9 years ago
I just tried to decompile and verify Python standard library, of course my fix 
verified failed many files.
I wish your fork will give me surprise.
Thanks!

Original comment by www.eh...@gmail.com on 12 Nov 2011 at 5:58