geohot / qira

QEMU Interactive Runtime Analyser
MIT License
3.94k stars 470 forks source link

Full Static Backend with BAP? #104

Open tim-becker opened 9 years ago

tim-becker commented 9 years ago

Now that BAP supports a fair number of architectures (ARM, x86, x86-64) and file formats (ELF, MachO, COFF), it is becoming increasingly more usable for QIRA. BAP implements many of the features that we have in QIRA static, for example:

And many features that would be nice to have, for example:

Getting this information from BAP would greatly improve the performance and correctness of QIRA static. Additionally, this would be a nice way to resolve issues #91 and #84.

This thread is meant to start discussion about possibly implementing a full static backend using BAP. Thoughts?

geohot commented 9 years ago

QIRA static was more meant as a an example API for the dynamic QIRA to use, notice the directories builtin and r2 under static2. I suspect you can add bap under a "bap" folder and use the same API, in hopes one day bap is so good the others can be deleted. "builtin" was never made to last, and I'd love to see it replaced. Replacing static2 entirely may not be the best idea though, as I think the static2 dictionary-like tags API is quite good. (for example, in contrast to IDAs horrendous one)

ivg commented 9 years ago

yep, I'm also sticking with placing BAP on the same place where r2 or whatever stands. Btw, BAP also has it own tags, so maybe we should think about how to share this tags. It would be a good idea, if QIRA can provide some information back to BAP. My question, do you have any specific requirements about CFG? Are you expecting BAP to provide you full CFG, or you want only symbols, and you're going to reconstruct the CFG yourself?

nedwill commented 9 years ago

I think we would prefer if BAP could give us the CFG. CFG reconstruction in Python is slow.

geohot commented 9 years ago

What do you mean by the full CFG? Look at what "builtin" provides to static2, I don't expect more. I want all the heuristic like stuff in BAP, but CFG recovery given you know which instructions end a basic block and where they jump is super easy.

What is slow right now is just formatting and sending all that data. That isn't a Python problem, that's a problem of it not breaking it up into chunks. I suspect it would be the same if BAP gave a big CFG dump if the functions are big.