Mysterie / uncompyle2

uncompyle2
642 stars 150 forks source link

"bad marshal data (unknown type code)" when invoking on pytest .pyc #25

Open Lucas-C opened 9 years ago

Lucas-C commented 9 years ago

There is how to reproduce the issue, with the latest version from the repository:

$ cat stupid_test.py 
def test_dummy():
    assert True
$ py.test -q stupid_test.py                                                                                                      
.
1 passed in 0.01 seconds
$ python2.7 /opt/uncompyle2/scripts/uncompyle2 __pycache__/stupid_test.cpython-27-PYTEST.pyc                                 
#2015.02.14 10:47:50 CET
### Can't uncompyle __pycache__/stupid_test.cpython-27-PYTEST.pyc
Traceback (most recent call last):
  File "/home/lucas/.local/lib/python2.7/site-packages/uncompyle2/__init__.py", line 197, in main
    uncompyle_file(infile, outstream, showasm, showast, deob)
  File "/home/lucas/.local/lib/python2.7/site-packages/uncompyle2/__init__.py", line 129, in uncompyle_file
    version, co = _load_module(filename)
  File "/home/lucas/.local/lib/python2.7/site-packages/uncompyle2/__init__.py", line 77, in _load_module
    co = marshal.load(fp)
ValueError: bad marshal data (unknown type code)
# decompiled 0 files: 0 okay, 1 failed, 0 verify failed
#2015.02.14 10:47:50 CET

Any idea on the root cause and if it could be fixed ?

rocky commented 8 years ago

In debugging https://github.com/rocky/python-uncompyle6 I've come across this a bit.

At this point in the code, uncompyle2 is trying to extract a Python code object from the byte-compiled file. It can't. This will most definitely happen if you try to use marshal.load on a version other than the version you are running.

But you might say: but I am running the same python version!

Maybe and maybe not. In Python 2.7 the magic numbers changed several times in Python 2.7. Here are the changes as best as I know:

magic  release and description
-----  -----------------------
62171: 2.7a0 (optimize list comprehensions/change LIST_APPEND)
62181: 2.7a0 (optimize conditional branches:
       introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
62191: 2.7a0 (introduce SETUP_WITH)
62201: 2.7a0 (introduce BUILD_SET)
62211: 2.7a0 (introduce MAP_ADD and SET_ADD)

There's nothing in the above that I know would change data characteristics needed by a marshal load, so this remains a mystery.

However in uncompyle6, I now only will use marshal.loads when the bytecode interpreter number is exactly the same as the running interpreter magic number. (Previously I was just comparing on Python major/minor numbers.) To be not-too much and not-too little one would have to test against the various magic values to see what works and what doesn't.

Lucas-C commented 8 years ago

Nice explanation, thanks ! Is your python-uncompyle6 project usable already ?

rocky commented 8 years ago

Is your python-uncompyle6 project usable already ?

Perhaps for Python 2 bytecode. You can run it from CPython2 (2.6 or 2.7) or CPython3. For Python3 bytecode, it still needs work.

It is easy to come up with lots of tests that cause a failure. One project is organizing the tests better and fixing some of the failures that occur there. But a large number of those also fail, also fail in uncompyle2. (#14 has been fixed though)

This and/or the other uncompyle projects all could use help in fixing bugs.

rocky commented 8 years ago

One other clarification regarding this:

However in uncompyle6, I now only will use marshal.loads when the bytecode interpreter number is exactly the same as the running interpreter magic number.

uncompyle2 unconditionally uses marshal.loads() and when this works, it is most-likely correct. This change in behavior was a in commit 09b2adbbbde46ce30d3f1a36c83293572f8b56f0.

The limitation with this is that you can only disassemble Python bytecode that have compatible bytecode formats. So although this project still has opcodes for around for Python 2.3-2.6, it is possible some of these after the commit won't survive a masrhal.loads(). That said, I trust Mysterie to have tested what's is provided here, so I'll assume that they are data compatible.

The older code (which is used in the PyPI version of uncompyle2) provides its own marshal load routine written in Python and uncompyle6 uses that as well. As I have recently found, that has problems too when using different versions of Python, especially when going between Python 3 and Python 2. See https://github.com/rocky/python-uncompyle6/blob/master/uncompyle6/marsh.py#L46-L150 and compare with https://github.com/Mysterie/uncompyle2/blob/master/uncompyle2/disas.py#L195-L270

So sorry for the long-winded clarification. What I meant was uncompyle6 uses marshal.loads when the magics are the same, it uses the all-version Python code that is supposed to be equivalent (and probably still has bugs) when the magics are different.