falconre / falcon

Binary Analysis Framework in Rust
Apache License 2.0
549 stars 47 forks source link

towards a rust reversers datastructures crate #23

Closed m4b closed 6 years ago

m4b commented 7 years ago

Hello!

So I've been reading project, really great work and I'm really excited for some of the stuff you're doing, can't wait to see more, no matter what gets decided!

On that note, as you guessed from the title, I'm hoping it might be possible to consolidate 0-N things from this, panopticon, and a theoretical new memory interval crate that I want to write, as well as some other things.

This is a huge, huge topic, and I likely won't hit on a lot of the points, but just getting the ball rolling is good I think, if only to see if you're interested, where you're headed with things, etc.

If you're not interested at all, that is totally fine of course :) Just wanted to see what you think

Generic IL/Function crate

So, for starters (and probably most controversially), reading through your source, particularly the il module, there is so much that I think could be refactored (along with panopticon) into a generic function/il rust crate.

I say controversial because it will likely be hard/tedious, but i do think it would be (extremely) beneficial.

It would also require probably the deepest amount of coordination, which could be hard.

Nevertheless, I think some prime candidates are the il, and the function objects. If we could somehow make Function<IL>, where IL is the intermediate language used, this could have really really cool benefits.

  1. It would allow all of us to try out each other's IR
  2. it would allow us to switch to another IR for a different task (perhaps one is more compact, hence faster analysis)
  3. It would allow us to reuse the function logic and definitions and methods, consolidating bugs and developer effort into a single location (this and 2 are my prime motivation)

Its hard for me to state how great this could be if we were able to swap out IL's at will. It also just seems right from an engineering perspective, similar to backends on a compiler.

As it stands now in both codebases, I think this modification is almost trivially possible - except - the disassembler aspect.

But this isn't necessarily bad news!

For almost the exact reasons in 1-3, i think it would be really cool to allow function (or whatever it ends up being) to also be generic in the disassembler, allowing a more robust disassembler implementation (like capstone), or a home grown solution like panopticon, etc.

Again the benefits here are experimentation, can try different assembler for different IR backend, etc.

Doing this I think will require sketching out what a generic function + a generic disassembler would look like, and what would be the most flexible, and hence requires the most cooperation and assessment of current codebases dependencies and expectations etc., but long term I think it would be really cool, it would allow all our work to be pooled together and hence we'd all benefit.

While I think this will be the hardest part to refactor, coordinate and get right, I think it will actually have the most benefit; of course, this is just my opinion though :)

An interval tree crate, with a second crate geared towards binary memory intervals

I don't think this is controversial at all, and I think it would be invaluable. I want something like this already for bingrep, panopticon needs it, and i'm sure falcon could use it too.

Basically the idea is a:

[x..y) -> Value

Which is a datastructure that's created after the parser pass (or whenever you want, as long as you can send it a goblin binary), and which initially gets filled up with segment/section data; which ranges, what the name of the segment is, and perhaps what "kind of data" is there. We'd figure out what we want for a segment datatype, what information we'd need, etc. And of course, if its a central crate, when we need something new, we just extend it and everyone gets the benefit.

Similarly, and this would be the tricky part and where I want feedback, downstream users could also extend the memory ranges with their own tagging data, like [0xbeef..0xdead) -> FunctionRange, etc.

Even if some fancy runtime extendable type doesn't work out, even if we just agree on an enum in this crate which downstream clients use, I think this would be great code reuse and benefit everyone all around.

dynamic linker/runtime loader

Your loader looks really awesome!!! So i've been trying to get other persons to help create a relocator crate for a while, but no one is really interested in this stuff :laughing:

Anyway, at some future data I think panopticon wants to have this. So I've wanted to turn https://github.com/m4b/dryad into a library for quite some time. Basically I like working on that project and I'll find any excuse; also all that code going to waste would be sad.

So I'd like to propose potentially fusing falcon's runtime loader here with dryad, or vice versa, perhaps dryad becomes a lib, or i rip out parts of it via copy paste, whatever, and then that crate is refactored to be a library which downstream consumers like falcon and panopticon (and whoever really, who knows the applications!) can use it as their runtime linking and loading system.

Initial issue is i'll have to put the asm usage in dryad and bare functions behind feature flags, as it requires nightly, and its not nice to force that on downstream clients (which would be sad, since it's a pure rust toolchain dynamic linker that way!)

More things haven't even thought of

Anyway, that's my suggestion for 3 different things I think are candidate usecases to refactor out into shared dependencies for great good. I'm sure there are many other opportunities as well.

Let me know what you're thinking; as you can tell, I'm of the persuasion we should combine all of our powers and take over the universe :angel:

Thanks for reading this far, I know, it was a lot :)

/cc @flanfly

endeav0r commented 7 years ago

Sorry headed to work, not trying to cut short.

1) Traits, can be done. 2) Providing a crate of abstract domains would be useful. 3) If you turn dryad into a library I will drop my loader stuff and use it.

(I'll come back to this tonight, or we can have a discussion somewhere more chatty than GitHub issues)

endeav0r commented 7 years ago

For anyone else watching this issue with interest, there will be a conversation at 3pm EDT (1900 GMT) in the panopticon gitter to discuss.

jrmuizel commented 7 years ago

On Saturday?

endeav0r commented 7 years ago

Whoops, my apologies. Sunday, 20 August, 2017, 1900 GMT.

m4b commented 7 years ago

Isn't 3pm (1500) EDT 2100 GMT? Otherwise I think this meeting is in 12 minutes if it's at 1900GMT, no?

m4b commented 7 years ago

Ignore me I don't understand time. Thought it was 9 hour diff to pst