BoomerangDecompiler / boomerang

Boomerang Decompiler - Fighting the code-rot :)
Other
370 stars 59 forks source link

Fix sparc/twofib failure #26

Open nemerle opened 10 years ago

nemerle commented 10 years ago

The twofib code has the following pattern

call twofib
nop
illtrap

and twofib itself is returning like this

jmp     %i7+0xC

instead of using ret ( which is %i7+0x8 ) , so it basically returning to next instruction over.

This will be pretty tough to solve without assuming that no call returns until proven to do so. The simplest ad-hoc solution would be to have sparc/gcc specific pattern translator, which would replace call/nop/illtrap with call/nop/nop, and when function is doing jmp %i7+0xC replace it with ret.

ashh87 commented 10 years ago

I had a look at this and the decompile fails because of the illtrap. If you force sparcfrontend to continue when it hits an illegal instruction (uAddr = uAddr + 4;, remove the assert(0) and break instead of return), then it will continue, although it doesn't recognise that there should be a return statement.

This pattern happens as the function is returning a structure (see http://www.cs.indiana.edu/~sabry/teaching/compilers/wi97/SPARC.txt , structure return). If the size of the structure changes, then the illtrap instruction changes too. I think the way to deal with it might be to recognise struct calls, and recognise 'jmp %i7+0xC' as a return when we are in a struct call, if that's possible.

sparc.pat (and so sparc.pat.m and I think sparc.pat.cpp) have the struct_call pattern, but I can't see any hint of dealing with a return appropriately. Also, this code seems to be very old, never used and generated from sparc.pat, which I haven't found how to do. I think c++ code can be made using the mltk, but the binaries provided won't run on my system (can't handle x64?) and compiling failed - the source needs to be updated!

It would be good to get these tools compilable again. Could this repo be moved into a new github project, and them all of the relevant other sources added as repos to the project? So they're all in one place... Once they're compilable, they can be updated/replaced etc. (icon, noweb, sml/nj and mltk)

When you suggest pattern matching, what is the mechanism for doing so other than hard-coding into the decoder and front-end? If it is hard coding, it would be great to get sml/mltk etc working, so we can have a flexible system for this sort of thing :) I can help out in the future if you point me in the right direction :D

nemerle commented 10 years ago

I'll tackle your comment in FILO order if You don't mind :)

When you suggest pattern matching, what is the mechanism for doing so other than hard-coding into the decoder and front-end? If it is hard coding, it would be great to get sml/mltk etc working, so we can have a flexible system for this sort of thing :) I can help out in the future if you point me in the right direction :D

To put my answer to Your question into context: I freaking miss disassembly view ! :) With that out of the way.. The way boomerang works now, decoder and frontend are pretty much intertwined. What I think would be better, is:

Now we can add compiler/abi/language specific MachineInstructionPatternRecognizerAndTransformer (MI_PRAT for short :) ) into Frontend that will 'filter' incoming instructions and transform them/extract additional data. To use the sparc struct return example from twofib:

call    twofib
nop
illtrap8

Assuming we had SparcAbiPRAT it would, upon discovering CALL MachineInstruction: * try to request the next two instructions and:

And why do I think we should separate MI_PRAT from Frontend ? Different ABIs for the same CPU, compiler specific patterns, OS specific patterns.

Actually I'm not 100% sure that MI_PRAT should be a part of the 'pipe' between Decoder and Frontend, maybe we should allow Frontend to 'fail', run MI_PRAT, if it 'fixes' things, restart decoding from the fixed location ?

As You can see, this approach might be a bit better than hard-coding stuff into all those pat/m files. ( also I'll freely admit that they scare me :) )

Getting the whole mltk decoder generator to work is huge PITA, and I sincerely hope we can use other open-source disassemblers in it's stead. Given a clean and simple enough Decoder/Frontend plugin interfaces the task of adding new CPUs should be much more approachable too :) )