GaloisInc / macaw

Open source binary analysis tools.
BSD 3-Clause "New" or "Revised" License
208 stars 21 forks source link

symbolic-syntax: Syntactic sugar #349

Open langston-barrett opened 1 year ago

langston-barrett commented 1 year ago

macaw-symbolic encodes functions as Crucible CFGs that take and return a struct of all the register values. This struct has a lot of fields. Here's a very simple no-op Macaw x86_64 CFG written in the symbolic-syntax:

(defun @test ((regs (Struct
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      (Ptr 3)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512))))
                    (Struct
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      (Ptr 64) 
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      Bool           
                      (Ptr 3)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 2)  
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 80) 
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512)
                      (Ptr 512))
  (start start:
    (return regs)))

This is way too long to write in a bunch of test cases. Additionally, accessing individual registers involves loading fields from this struct at certain indices, but such statements don't really incidate which registers are being accessed.

We should introduce some syntactic sugar to make it easier to write macaw-symbolic CFGs. In particular, we should add:

This could be accomplished using the ParserHooks argument to machineCodeParserHooks.

We will need separate packages for each architecture:

RyanGlScott commented 12 months ago

An alternative approach (which was adopted by the ambient-verifier tool's override syntax) is to allow giving S-expression functions ordinary type signatures like this:

(defun @test ((x Int)) Int
  (start start:
    (return x)))

And then mapping the argument and result values to the corresponding registers. Most of the time, this is far more convenient, and it makes the overrides more portable across architectures. It does have the downside of not giving the user fine-grained control over the register state, however. As a concrete example of where this would be important, ARM's __aeabi_uidivmod function returns two values in registers instead of just one, which would not be straightforward to model with the typical one-return-value convention that most crucible-syntax functions use. Perhaps we could devise a way for users to choose which syntax they prefer.

langston-barrett commented 3 months ago

422 accomplishes this for x86_64, we should create similar packages for the other architectures. I've added a checklist to the OP.