BinaryAnalysisPlatform / bap

Binary Analysis Platform
MIT License
2.07k stars 273 forks source link

adds semantics for the x86 SSE floating-point instructions #1466

Closed ivg closed 2 years ago

ivg commented 2 years ago

Introduction

This PR utilizes the recently added intrinsic Primus Lisp primitive and represents and represents floating-point operations as calls to intrinsic functions. The set of operations coincides with the set of floating-point operations in Core Theory and SMT-LIB. The rounding modes and floating-point format are encoded in the name of the intrinsic, e.g., here's how the ADDSD XMM0, qword ptr [RBP + -0x18] is lifted,

00000881: intrinsic:x0 := extract:63:0[extract:127:0[YMM0]]
00000884: intrinsic:x1 := mem[RBP - 0x18, el]:u64
00000887: call @intrinsic:fadd_rne_ieee754_binary with return %00000889

00000889:
0000088a: YMM0 := YMM0 &
          0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00000000000000000000000000000000
          | pad:256[intrinsic:y0]

The rounding mode is represented as a parameter, which could be either a constant (the default and is equal to +rne+) or could be read from the MXCSR register).

Deprecated Features

This PR sunsets two existing features.

First, is the legacy floating-point lifter, that previously could be enabled with the --x86-with-floating-points option. While we do not plan to remove it in the foreseeable future, it is not really necessary anymore (other than for comparison and debugging). To make the option name less confusing, it is renamed to --x86-with-legacy-floating-points (as floating points are now enabled by default and there's no need to enable them explicitly). The old option has the same behavior, except that it will print a big fat warning message.

Another feature is the old capability of the BIL lifter to automatically translate unknown instructions (or instructions belonging to some class) into intrinsic calls. The new intrinsic primitive supersedes this option at the cost of some extra manual work. However, unlike the automatic translation, it enables explicit data flow and well-defined intrinsic calling conventions. If you will enable this mode, you will get knowledge base conflicts with the x86 plugin since both will try to turn x86 floating-point instructions into intrinsic calls in different ways. If you still need to use this feature of the BIL lifter, then you can pass the --x86-disable-floating-point-intrinsics option to suppress the x86 lifter. We currently don't know if we will remove the automatic translation altogether or rewrite it in a such way that it will work with the modern BAP. But in any case, it is better to explicitly denotate the missing instructions as intrinsic calls instead of relying on the automatic translation.

Bug fixes, Tweaks, and some New Small Features

This PR also contains a few bug fixes and features that were necessary to make everything work together (we now have 4 different floating-point lifters!).

The Primus.Lisp.Semantics.context allows us to constrain the context of the applicability of Primus Lisp definitions, so that we can disable the translation of the floating-point instructions into the intrinsic calls.

Several bugs were also discovered during the work on this PR.

The edge contraction procedure in the IR lifter was sometimes producing blocks in an incorrect order. The fixed version is also contracting more edges. Moreover, it is now enabled by default for all lifters (before that it was only contracting edges for IR generated by the Ghidra lifter).

The stack correction and automatic ABI were preventing the intrinsic functions from working (since they are not C). To fix this, we introduced a new intrinsic attribute to the subroutine. If a subroutine is marked as an intrinsic then it won't have C type automatically inserted and the stack correction in x86 will be disabled. In addition, the already existing synthetic attribute is used to tag synthetic subroutines.

Finally, this PR fixes some bugs in the semantics of pcode wrt the floating-point operations, so now it is fully functional and tested (using the testsuite/sse/float program).

Summary of the Available Floating-point Lifters

Basically, we now have 4 options to lift and run the floating-point programs (all passing the test for our simple toy program),

1) the default lifter that translates instructions into intrinsic calls:

bap sse/floats --run

2) the Ghidra lifter that translates pcode floating-point operations into intrinsic calls:

bap sse/floats --run --x86-backend=ghidra

3) the BIL ability to translate the unknown instructions into intrinsics

bap sse/floats --run \
    --bil-enable-intrinsic=:unknown \
    --primus-lisp-load=posix,llvm-x86-64-floats \
    --x86-disable-floating-point-intrinsics

4) the legacy floating-point lifter together with BIL floating-point emulation mode.

bap sse/floats --run \
    --x86-with-legacy-floating-points \
    --bil-enable-fp-emulation

Future Work

In the future we plan to utilize the BIL floating-point emulation mode (translation of the basic floating-point operations, such as +,-,*,/,sqrt into bitvector operations) to give concrete semantics to the corresponding intrinsic functions so that they will not have the empty denotation. Right now, the first two examples above are executable, because we defined the dynamic Primus Lisp semantics for these intrinsics. They work, however, only for ieee756 64-bit floats and use OCaml floats underneath the hood.