Mysterie / uncompyle2

uncompyle2
642 stars 148 forks source link

Python 2.5 scanner incorrectly restructures the code after deleting instructions #2

Closed codepainters closed 11 years ago

codepainters commented 11 years ago

I believe the following fragment of Scanner25.py is wrong:


            for x in toDel:
                if self.code[x-delta] >= dis.HAVE_ARGUMENT:
                    self.code.pop(x-delta)
                    self.restructCode(x-delta)
                    self.code.pop(x-delta)
                    self.restructCode(x-delta)
                    self.code.pop(x-delta)
                    self.restructCode(x-delta)
                    delta += 3
                else:
                    self.code.pop(x-delta)
                    self.restructCode(x-delta)
                    delta += 1

For 3-byte instructions it tries to restruct the code after each byte removed - thus, if e.g. restructCode is called after the first byte is removed, it will treat second byte of the instruction as opcode.

The call to restructCode() should be done only once, after removing all 3 bytes. However, this means that restructCode() need to know if 1 or 3 bytes were removed.

codepainters commented 11 years ago

Following patch made it work for me:

diff --git a/uncompyle2/Scanner25.py b/uncompyle2/Scanner25.py
index 9fc3f4d..8c9d17d 100755
--- a/uncompyle2/Scanner25.py
+++ b/uncompyle2/Scanner25.py
@@ -91,11 +91,11 @@ class Scanner:
             for x in toDel:
                 if self.code[x-delta] >= dis.HAVE_ARGUMENT:
                     self.code.pop(x-delta)
-                    self.restructCode(x-delta)
                     self.code.pop(x-delta)
-                    self.restructCode(x-delta)
                     self.code.pop(x-delta)
                     self.restructCode(x-delta)
+                    self.restructCode(x-delta)
+                    self.restructCode(x-delta)
                     delta += 3
                 else:
                     self.code.pop(x-delta)
Mysterie commented 11 years ago

In fact this code isn't wrong (but very crappy), I have to change the function name.

The role of the restructCode() function is to patch all jump opcode argument after I remove one byte from the bytecode. If I don't do that some jump goes to the wrong target.

"The call to restructCode() should be done only once,". You are right, and I've change that. Now function patch all jump target in one time. And then I delete the useless opcode.

codepainters commented 11 years ago

I understand the role of this code. But I'm not really convinced it is correct - if it deletes 3-byte instruction byte by byte, and calls restructCode after each byte, how is it protected from treating 2nd and 3rd instruction byte as opcode?

I've seen it actually corrupting the code - restructCode subtracted 1 from opcode itself when rewriting some "with" statement:

ParserError: --- This code section failed: ---

0       LOAD_FAST         'self'
3       LOAD_ATTR         'lock'
6       SETUP_WITH        '120'
9       POP_TOP           None

10      <123>             None               <-- input file is valid here!
13      LOAD_ATTR         'lower'
16      CALL_FUNCTION_0   None
19      LOAD_GLOBAL       'os'
22      LOAD_ATTR         'path'
25      LOAD_ATTR         'sep'
28      BINARY_ADD        None
29      STORE_FAST        'local_path'

Said that, I'm not really sure if my patch is safe and correct (this code is fairly complex) - yet it solved my problem at hand ;)

Anyway, I'll try to prepare a simple isolated test.

codepainters commented 11 years ago

OK, here's the code that failed decompiling:


from __future__ import with_statement

def clear_with_prefix(self, local_path):
    with self.lock:
        local_path = local_path.lower() + os.path.sep
        to_del = []
        for a in self.d:
            if a.startswith(local_path):
                to_del.append(a)

        for elt in to_del:
           del self.d[elt]

It worked OK after my dirty hack (I just changed the order, as original restructCode can deal with only 1 byte shift). I'll try your fix later, hope it will solve other failures I observe :)

codepainters commented 11 years ago

FYI, your latest code (c3a874220fcebf7241d40d234d325ac3273ff3ae) now crashes with the sample above:

# 2012.09.20 20:53:40 CEST
#Embedded file name: o.py

# Can't uncompyle o.pyc
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/__init__.py", line 200, in main
    uncompyle_file(infile, outstream, showasm, showast)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/__init__.py", line 137, in uncompyle_file
    uncompyle(version, co, outstream, showasm, showast)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/__init__.py", line 126, in uncompyle
    walker.gen_source(ast, customize)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 1384, in gen_source
    self.print_(self.traverse(ast, isLambda=isLambda))
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 484, in traverse
    self.preorder(node)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/spark.py", line 692, in preorder
    self.preorder(kid)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/spark.py", line 692, in preorder
    self.preorder(kid)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/spark.py", line 692, in preorder
    self.preorder(kid)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/spark.py", line 687, in preorder
    self.default(node)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 1158, in default
    self.engine(table[key], node)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 1108, in engine
    self.preorder(node[entry[arg]])
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/spark.py", line 685, in preorder
    func(node)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 862, in n_mkfunc
    self.make_function(node, isLambda=0)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Walker.py", line 1259, in make_function
    code = Code(code, self.scanner, self.currentclass)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Scanner.py", line 60, in __init__
    self._tokens, self._customize = scanner.disassemble(co, classname)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Scanner25.py", line 91, in disassemble
    self.restructCode(toDel)
  File "/usr/local/lib/python2.7/dist-packages/uncompyle2/Scanner25.py", line 408, in restructCode
    self.toChange[self.toChange.index(change)] -= self.op_size(self.code[toDel])
ValueError: 10 is not in list
# decompiled 0 files: 0 okay, 1 failed, 0 verify failed
# 2012.09.20 20:53:40 CEST
Mysterie commented 11 years ago

"if it deletes 3-byte instruction byte by byte, and calls restructCode after each byte, how is it protected from treating 2nd and 3rd instruction byte as opcode?"

Thanks to the first line of this code (from Scanner25) only jump opcode argument are affected :


        for x in self.op_range(0, len(self.code)):
            op = self.code[x]
            if op >= HAVE_ARGUMENT:
                if op in dis.hasjrel:
                    if x < i and self.get_target(x) > i:
...
    def op_range(self, start, end):
        while start < end:
            yield start
            start += self.op_size(self.code[start])

I change restructCode() for performance improvement and having clearest code.

Your behavior is due to the WITH_SETUP opcode. Because this opcode doesn't exist in the 2.5 bytecode table but is required for 2.7 bytecode.

So I have to create WITH_SETUP with other conditional jump (and it can be a wrong methode) :


            if offset in self.toChange: # opcode to change to WITH_SETUP
                if self.code[offset] == JA and self.code[oparg] == WITH_CLEANUP:
                    opname = 'SETUP_WITH'
                    cf[oparg] = cf.get(oparg, []) + [offset]

I'll try to improve handling of the with statement :) Thank you for reporting bug and debug information!

codepainters commented 11 years ago

OK, I assumed incorrectly that restructCode() scans starting at i - while actually it always starts from index 0. Still, somehow restructCode() was corrupting the bytecode. I've spent a lot of time today dumping self.code at various places - after the loop with restructCode() it already was corrupted (as above - invalid opcode). Anyway, I can see you reworked this part significantly, so it's no longer relevant.

And as soon as I applied my workaround, "with" statement decompiled correctly, so I guess SETUP_WITH handling is itself correct (yet hackish) ! :)

Thanks for your support (pitty I have no time to dig it deeper, it's pretty interesting stuff).

Ophidiophobia commented 11 years ago

I fiddled with the code myself. I think I stumpled upon the same problem when I disassembled a python 2.6 pyc file. I think it fixes this issue here because compiling https://github.com/Mysterie/uncompyle2/issues/2#issuecomment-8741094 using python 2.6 and then using uncompyle2 on it causes a similar error as https://github.com/Mysterie/uncompyle2/issues/2#issuecomment-8741575

The change for Scanner25.py should be exactly the same but I have not tried working with 2.5 pyc files.

Scanner26.py In the function restructCode the rearrange of toChange locations (so far used to mark with statements for python 27 ) fails because if toDel contains more than one valid entry then self.toChange.index(change) cannot find the same index again because the value it was looking for was changed in the previous loop.

My fix for uncompyle2/Scanner26.py

@@ -409,12 +409,15 @@ class Scanner:
             result.append((block[0]+startBlock, block[1]))
         self.linestarts = result

-        for change in self.toChange:
+        for index in xrange(len(self.toChange)) :
+            change = self.toChange[index]
+            delta = 0
             for toDel in listDel:
                 if change > toDel:
-                    self.toChange[self.toChange.index(change)] -= self.op_size(self.code[toDel])
+                    delta += self.op_size(self.code[toDel])
                 else:
                     break
+            self.toChange[index] -= delta

         for jmp in self.op_range(0, len(self.code)):
             op = self.code[jmp]

Using xrange to iterate over self.toChange is not necessary but I wanted to get rid of the index method.

Mysterie commented 11 years ago

I've checked restructCode() function and you're right, it fail only with WITH_SETUP statement on the condition you explain. I've patched the code, thanks to your report and fix :)