GaloisInc / dismantle

A library of assemblers and disassemblers derived from LLVM TableGen data
24 stars 5 forks source link

This is a collection of libraries implementing assemblers and disassemblers for several architectures based on LLVM TableGen data.

Setup

This package is supported on Ubuntu Xenial (16.04 LTS) and depends on the following packages:

To configure your system to build this package, run setup.sh.

Library Concept

The high level idea of this library is to generate assemblers and disassemblers from the data provided by LLVM TableGen. Among other things, this data includes lists of all of the instructions for the Instruction Set Architectures (ISA) that we care about. Moreover, it includes the encodings of those instructions, as well as their operands and types of operands.

Repository Layout

This repository contains a core dismantle-tablegen Haskell package and various architecture-specific packages named dismantle-$ARCH. The dismantle-tablegen package provides a TableGen parser, Template Haskell functions to generate assemblers, disassemblers, and pretty printers from LLVM TableGen data, and common testing functionality. The architecture-specific packages then use that functionality to produce architecture-specific implementations from LLVM TableGen files for each architecture. Each package includes its own test suite to test the generated (dis)assembler and prety printer on various input binaries (see tests/bin in each package for those). It also tests the pretty printer output against what objdump produces for the same binaries.

Stability Notes

Please note that the various architecture-specific ISA packages (such as dismantle-arm) may be incomplete or incorrect for some instructions, operands, or pretty-printed representations. This is because the degree to which any of these backend packages has been completed is a function of the particular binaries we used to test them (i.e. the binaries found in tests/bin in each package). As a result it's possible that when using dismantle with new binaries, some instructions may not be supported by the disassembler, may reassemble to incorrect bit sequences, or may have erroneous pretty-printing representations when compared to the output of objdump. As new binaries are used with this software, code coverage of the ISA in question may increase, revealing as-yet-unimplemented or incorrect operand type implementations.

If any of the above cases are encountered, the operand type(s) and instruction(s) in question will need to be supported. Fixing this involves one or more of the following tasks:

Generating TableGen Files

The file we take as inputs to this suite of tools are not actually in the TableGen format (extension .td); instead, we consume the output of the llvm-tblgen tool, which reads the real TableGen files and pre-processes them. We use data files generated from sources of LLVM 3.9.

The real TableGen files are included in the LLVM source distribution.

.. code-block:: shell

Assuming that the LLVM source has been unpacked to ${LLVM_ROOT}

cd ${LLVM_ROOT}/lib/Target

Choose the architecture you want to process, assume PowerPC

cd PowerPC

Run tablegen

llvm-tblgen -I${LLVM_ROOT}/include PPC.td > PPC.tgen

The .tgen extension is made up for this project, and not something from LLVM. The default output of the llvm-tblgen tool is a fully-expanded version of the input TableGen files. It is reasonably easy to parse, and the format we consume in the dismantle-tablegen library to produce assemblers and disassemblers.

Repairing TableGen Entries

In rare cases, LLVM's TableGen data can be broken in a variety of ways:

In these cases a variety of failure modes can manifest:

Ideally these problems would not exist, and if they do, ideally we'd file bug reports with LLVM and wait for those reports to be addressed. But in the mean time, we may need to continue development and fix the problems ourselves.

To resolve this, we provide a TableGen entry override feature. This entails creating a new file with a .tgen suffix containing repaired versions of the appropriate defs or classes, placing it in a directory, and then adding that directory's path (relative to the architecture-specific package root) to a list of override paths to the Template Haskell functions genISA and genISARandomHelpers. For example, for the dismantle-aarch64 package, we have some .tgen files in dismantle-aarch64/data/override/ and then we have::

$(genISA isa "data/AArch64.tgen" ["data/override"]) $(genISARandomHelpers isa "data/AArch64.tgen" ["data/override"])

It's important to pass the same override paths to each of the above Template Haskell functions to ensure that the same overrides are applied to both code generation steps.

The overrides are processed as follows:

Developing in Template Haskell

Development of Template Haskell code can be frustrating, especially when things do not type check as expected. Some tips: