Possible bug in assembler.py

abel1502 commented 5 years ago

I'm slightly concerned by the following lines in the assembler.py:

# Modify relative jump to absolute jump
if ins.mnemonic == 'JUMP_FORWARD':
    ins.opcode = dis.opmap['JUMP_ABSOLUTE']

This modifies the opcode value without changing the mnemonic, and, more importnantly, the argument - I suppose this should at least add the current instruction's offset to the arg.

P.S. A bit of a backstory: I'm currently trying to add the support of EXTENDED_ARG stuff, but for that I need to understand the principles of the whole app. The idea I came up with is just to interpret them with the following opcode as one big opcode 6 bytes long, and after that come the details. I started this because I'm trying to decompile one obfuscated wot mod, so I'm simultaneously facing both the BigWorld and EXTENDED_ARG problems, and I have actually figured out some interesting solutions, but I could really use better understanding of the obfuscation. Also I'm intrested in whether the author, @extremecoders-re is still maintaining this project and if he is interested in continuing the development?

extremecoders-re commented 5 years ago

Yes, ideally the mnemonic should also be changed to JUMP_ABSOLUTE but that should not have any effect on the final output. This is because as you can see only the following three functions are called after.

https://github.com/extremecoders-re/bytecode_simplifier/blob/42f4e3e681f42ff4fe8725ef8ba5b5f299bf180c/assembler.py#L84-L86

The calculate_ins_operands function operate on the opcode and not the mnemonic. But yes for correctness the mnemonic should also be changed to JUMP_ABSOLUTE.

As for the second question, there's no need to change the argument. For instructions which reference another instruction such as JUMP_ABSOLUTE, POP_JUMP_IF_TRUE etc by means of numeric offset/absolute address we do not store the numeric value at all. Instead we store the reference to the instruction in argval, similar to a Pointer. See class Instruction

https://github.com/extremecoders-re/bytecode_simplifier/blob/42f4e3e681f42ff4fe8725ef8ba5b5f299bf180c/instruction.py#L4-L23

This is done during the disassembling step in function build_bb_edges. The advantage of using references instead of offsets/addresses is that we can add/remove instructions without caring about correcting offsets every time. After we have finalized the structure of the code (ordering of the basic blocks) the offsets are again generated based on the modified code.

I would like to continue development but I haven't checked PjOrion lately. So the code is out of date and maybe a lot of changes need to be made to make it working. If you're interested in working on this then I can help.

abel1502 commented 5 years ago

Thank you, your help would be very useful. As far as I'm concerned, pjorion hasn't changed at least the outer layers structure, judging by the signature I met in the file and the v1 signature checked by your tool.

abel1502 commented 5 years ago

Actually, now that I've completed implementing and fixing this stuff, I can see it doesn't work now). Pjorion has the code for inner eval zlib-ed inside its code. I'm currently trying to think of another method of deobfuscation. Running a pyc through my edited version of this makes it break when launched. If anyone still wants it, I can upload it.

extremecoders-re commented 5 years ago

IIRC there were multiple layers. It was possible to unwrap the layers by bytecode tracing. I had built a tool for this pjunwrapper. Not sure it will work now as-is, but that was the idea.

Once we get to the final innermost layer we could then run bytecode_simplifier on it.

abel1502 commented 5 years ago

Oh, thanks, I shall try that

abel1502 commented 5 years ago

Nope, they don't leave line numbers and break if they're added(. I have another idea, but I can't really implement it: What if you could just compile a custom cpython2.7 with --with-pydebug and cflag -DPy_DEBUG and just trace all opcodes from there? I've tried, but I just can't get it to install zlib. Also I'm trying to utilise the legitimate wot client to bypass the bigworld module check - I'll tell if it works

extremecoders-re commented 5 years ago

Nice idea. That's similar to what pjunwrapper does. The default behavior of Python is to enable tracing only when line numbers exist. To circumvent this we use a custom version of Python with these checks removed, so it will call the tracing function irrespective of line number information being present or not.

See: https://github.com/extremecoders-re/python2-tracer https://0xec.blogspot.com/2017/03/hacking-cpython-virtual-machine-to.html

What if you could just compile a custom cpython2.7 with --with-pydebug and cflag -DPy_DEBUG and just trace all opcodes from there?

Python has a LLTRACE flag which will print the instructions as they are executed. But you need more than just being to able to trace. Particularly you would want some way to get hold of the actual bytecode after its un-zlibbed and exec'd.

abel1502 commented 5 years ago

Wow, thanks. Although, my system is a giant mess, so it's gonna be quite a struggle to compile that or even launch the precompiled one (UPD. Already failed)). But still, thanks a lot. I just managed to fix my linux vm, so hopefully I can now install zlib, at least

abel1502 commented 5 years ago

You know, now that I think of it... BigWorld must be inside the wot's distribution of python. Could there be a way to modify an existing build's debug-ness? Or, more likely, embed a precompiled module into another build? Combine them, basically. I'm most likely gonna be offline for about a month, but I'll be trying to figure something out. Good luck to you too

extremecoders-re commented 5 years ago

Yep, BigWorld is a part of wot engine. Maybe there's a way to fake the presence of the BigWorld module so that atleast it runs.

This may help: https://github.com/jhakonen/wot-teamspeak-mod/blob/master/futes/fakes/BigWorld.py

abel1502 commented 5 years ago

Yes, I've already tried it, but pj seems to perform some more complex checks, as with a fake module it just freezes endlessly. That is why I want to communicate with either a perfect copy of; or the legitimate BigWorld. If extraction isn't an option, I guess some kind of RPC-receiver wot mod with RPC-transmitter fake BigWorld could work

extremecoders-re commented 5 years ago

Just an update. The RPC idea is possible but I haven't tested it.

Here I was able to fake the presence of the BigWorld module using a simple C extension. It should have a method named player which returns Py_None.

Put the .pyd in the same directory as the protected pyc file (generated using "Exec only in WOT" mode) and it should run flawlessly in standard Python under Windows.

extremecoders-re commented 5 years ago

And the reason why a fake BigWorld module written in Python (BigWorld.py) doesn't work is because PjOrion checks the types of both the BigWorld module and its method player. If it finds they are not built-in methods it will crash. Using a C extension evades these checks. Additionally, with further tuning I was able to dump the final code object from memory after bypassing all the wrapper layers.

Which means we do not need to have a modded version of Python with tracing support. pjunwrapper and python2-tracer are no longer needed. Everything can be done in the standard one.

abel1502 commented 5 years ago

Wow, that is great news. I just came back online, sorry for late responce. I'll try to implement or find an automated python dumper, and will report here on my results. Also, I just realized I shoul probably upload my update to bytecode-simplifier, I'll try my best to do it today. Your assistance was really helpful, thanks again

abel1502 commented 5 years ago

I made a pull request

extremecoders-re commented 5 years ago

Thanks for the PR! I will merge as soon as I go through it.

Ulysses-Gaia commented 5 years ago

实际上，既然我已经完成了实现和修复这些工作，现在我可以看到它不起作用。Pjorion在其代码中具有用于内部评估的代码。我目前正在尝试考虑另一种去混淆方法。通过我的编辑版本运行pyc会使它在启动时中断。如果仍然有人想要，我可以上传。

!!Thankyou fou your bytecode_simplifier fix ,and Do you have a solution about BigWorld

abel1502 commented 5 years ago

No, sorry, I don't yet have one. I'll try to return to that soon, but I currently have a lot of exams, so I can't promise anything

abel1502 commented 4 years ago

Update: yesterday I finally took the time to fully deal with unwrapping, and it somewhat worked. Now I have a simple pjorion-protected file (which I can upload, if anyone needs), but now I'm having to deal with bytecode-simplifier's actual problems. First of all, some deprecated/unavaivable features of networkx are used, which I had to replace, and it all seemed fine. But nowt seems to get stuck at simplifying basic blocks (or at least it runs long enough for me to get bored). I'm unsure if that's my fault or simplifier's, so I would really love to get feedback from @extremecoders-re on this. If you need my pyc, I won't be able to provide it until maybe 9 hours from now, but after that it's all yours

abel1502 commented 4 years ago

Disassembler.py on line 214 adds the same node to the graph twice - that might be the cause, will test as soon as I get to my pc (a.k.a. 9h later)

extremecoders-re / bytecode_simplifier

Possible bug in assembler.py #3