Open codyroux opened 3 years ago
Here are two examples of this approach:
https://github.com/BinaryAnalysisPlatform/bap/tree/master/plugins/arm
and
https://github.com/BinaryAnalysisPlatform/bap/tree/master/plugins/ghidra
I'm not sure how the dispatch itself works though: who decides which lisp opcode is being called? Is this written in C++?
Better links would be to the lisp files, e.g., aarch64.lisp and pcode.lisp, which demonstrate how semantics could be described. The translation is done rather simply, when we enable the Primus Lifter, e.g.,
let translate_ops_to_args () =
KB.Rule.(begin
declare "translate-ops-to-lisp-args" |>
require Insn.slot |>
provide Lisp.Semantics.args |>
comment "translates instruction operands to lisp arguments"
end);
KB.promise Lisp.Semantics.args @@ fun this ->
KB.collect Insn.slot this >>=? fun insn ->
Theory.instance () >>= Theory.require >>= fun theory ->
Theory.Label.target this >>= fun target ->
args_of_ops theory target insn >>| Option.some
let translate_opcode_to_name () =
KB.Rule.(begin
declare "translate-opcode-to-lisp-name" |>
require Insn.slot |>
provide Lisp.Semantics.name |>
comment "translates opcode to the lisp definition name"
end);
KB.promise Lisp.Semantics.name @@ fun this ->
KB.collect Insn.slot this >>|? fun insn ->
let name = KB.Name.create ~package:(Insn.encoding insn) (Insn.name insn) in
Some name
Where arg_of_ops
translates instruction opcodes to semantic values,
let args_of_ops (module CT : Theory.Core) target insn =
let bits = Theory.Target.bits target in
let word = Theory.Bitv.define bits in
let bitv x = Bitvec.(int64 x mod modulus bits) in
Insn.ops insn |> Array.to_list |>
List.map ~f:(function
| Op.Fmm _ -> CT.unk word >>| Theory.Value.forget
| Op.Imm x ->
let x = bitv (Imm.to_int64 x) in
CT.int word x >>| Theory.Value.forget >>| fun v ->
KB.Value.put Lisp.Semantics.static v (Some x)
| Op.Reg v ->
let name = Reg.name v in
let reg = match Theory.Target.var target name with
| None -> Theory.Var.forget @@ Theory.Var.define word name
| Some v -> v in
CT.var reg >>| fun v ->
KB.Value.put Lisp.Semantics.symbol v (Some name)) |>
KB.List.all
This is how we go from the value of type Bap.Std.Disasm_expert.Basic.Insn.t, which is a machine-specific instruction representation, to the machine-independent Bap.Std.Insn.t, which encodes the semantics of that instruction. It works because the Primus Lisp lifter stores a promise that for a given name and a list of arguments provides the semantics, more specifically,
$ bap list rules | grep primus-lisp-semantics -A8
primus-lisp-semantics ::=
bap:primus-lisp-program
core:unit-target
core:unit-source
core:label-unit
bap:lisp-args
bap:lisp-name
-----------------------
core:semantics
Obviously, if we employ the same approach for the FrontC lifter we will have a C program as the input, e.g., encoded with Cabs.body. To lift the C program using Primus Lisp lifter we need to traverse the C AST and for each constructor create a program object, set its Lisp name property to the name of the constructor (with the c
package). Then for each argument of the constructor, we shall recurse and get their semantics and then pass it as arguments. (Yes, no promises involved, we will recurse over the existing AST).
Lets make the discussion less abstract and suggest some simple C program, e.g.,
WHILE (VARIABLE "halt", COMPUTATION (UNARY (POSINCR, VARIABLE "counter")))
which corresponds to the following C program,
while (halt) counter++;
Assuming that we have semantics for C defined in Lisp, e.g.,
(defun c:while (cnd body)
(while cnd body))
(defun c:posincr (var)
(set$ var (+ var 1))
Then our frontend will look something like this,
let rec stmt = function
| WHILE (cnd,body) ->
let* cnd = exp cnd in
let* body = stmt body in
eval "while" [cnd; body]
| COMPUTATION e -> exp e
| _ -> failwith "unimplemented"
and exp = function
| UNARY (POSINCR, x) ->
let* x = exp x in
eval "posincr" [x]
| VARIABLE v -> var v
| _ -> failwith "unimplemented"
where eval
is something like this,
let eval name args =
let* lbl = Theory.Label.fresh in
let name = KB.Name.create ~package:"c" name in
let* () = KB.provide Lisp.Semantics.name lbl (Some name) in
let* () = KB.provide Lisp.Semantics.args lbl (Some args) in
KB.collect Theory.Semantics.slot
Et voila!)
Well, it is voila for starters only, I would envision that we will need to do something with scopes and variables, but since they exist on the AST level, we should be able to handle them without extra hassle. I think you already have some code for that which most likely could be reused.
So thoughts?
Currently we write the C front end as a pattern matching over the C syntax, which directly writes programmatic semantics for each construct (expressions, sequences, calls, gotos...).
But this process is very similar to the "stardard" uplifters for various binary formats, which dispatch over opcodes, calling specific primus lisp functions by name, drawn from a
foo.lisp
file.It sounds like a viable way forward, enabling us to use all the Primus Lisp goodies, would be to rewrite the front-end in this style, and have a
c.lisp
file with the semantics of C constructs.This issue is dedicated to discussing the pros and cons, and the implementation details of this approach.