GaloisInc / macaw

Open source binary analysis tools.
BSD 3-Clause "New" or "Revised" License
208 stars 21 forks source link

`*-symbolic`: Reduce the usage of `projectLLVM_bv` #395

Open langston-barrett opened 3 months ago

langston-barrett commented 3 months ago

The Crucible-LLVM function projectLLVM_bv takes a pointer, asserts that it is a bitvector, and returns the bitvector value. The problem is that when this assertion fails, it does so with a generic and unhelpful error message. Macaw should avoid this function, and instead provide more detailed messages that match the level of abstraction of the program under test (e.g., "attempted to divide by a pointer").

See https://github.com/GaloisInc/crucible/pull/1220 for how this was done in Crucible-LLVM.

RyanGlScott commented 3 months ago

Searching for individual uses of projectLLVM_bv doesn't come up with that many hits, but that's a bit deceiving, since the majority of projectLLVM_bv invocations of performed transitively by invoking the toValBV function, which is a very commonly used helper function in all of the macaw-symbolic backends. In fact, nearly every instruction that involves bitvectors is likely to use toValBV in some form or another when encoding the instruction's semantics into Crucible, which means that we have our work cut out for us in crafting individual error messages for each toValBV call site. (We could report a generic error message like Found pointer instead of bitvector when calling toValBV, but that seems just as bad as the current status quo.)

Personally, I don't understand what every single x86-64, AArch32, and PowerPC instruction does well enough in order to craft all of these error messages. As such, my inclination is to just report something like Found pointer instead of bitvector when calling <instruction name>. While this is still generic, it still goes a long way in disambiguating where the pointer-to-bitvector conversions are actually happening, and thus I think it would still be an improvement.

sauclovian-g commented 3 months ago

That seems like a vast improvement over the equivalent of "Oops". For most instructions it's all you'd really want anyway (except you might add "in the 2nd operand of ...")

I know enough x86_64 and arm32 (though not ppc) to be able to rapidly interpret the manual for any cases that look like they ought to get deeper treatment; ping me if it would be helpful. But there's probably no need.