asmjit / asmdb

Instructions database and utilities for X86/X64 and ARM (THUMB/A32/A64) architectures.
The Unlicense
328 stars 46 forks source link
aarch64 arm database x86 x86-64

Important

This project is discontinued.

AsmDB was a separate project to provide instruction database for AsmJit and other tools, but primarily for AsmJit. Unfortunately the database used by AsmJit was often out of sync and it's almost impossible to know which AsmDB was used by AsmJit back in time. To solve this problem, AsmDB has been merged with AsmJit and resides now in asmjit/db directory of AsmJit project. This ensures that AsmJit's instruction tables are always in sync with AsmDB.

AsmDB

This is a public domain instruction-set database that contains the following architectures:

NOTE: There is currently work-in-progress to more standardize the database between various architectures, expect some data changes.

Data Files

Data files use .js suffix and are require()d and interepreted as JavaScript, however, these files are also parseable as JSON after locating JSON-BEGIN and JSON-END marks and stripping content outside of them. The database is meant to be readable and editable, thus it should stay small and simple.

The database provides the following concepts:

X86 Data Files

X86 data provides the following information about each X86/X64 instruction:

Base API

The database itself provides a lot of information about each instruction, but since the DB is meant to be human readable and editable, the information presented is not in the best form to be processed as is. AsmDB solves this issue by providing API that can be used to index and query information stored in these data-files.

The API provides the following concepts:

AsmDB API probides base interfaces for these concepts, and each architecture then provides ISA-dependent versions of these.

X86 API

AsmDB's asmdb.x86.ISA is the interface used to index and access the ISA. The following snippet shows a basic usage of it:

// Create the ISA instance populated by default x86 data.
const asmdb = require("asmdb");
const isa = new asmdb.x86.ISA();

// Returns an array of instruction names stored in the database:
console.log(isa.instructionNames);

// Iterates over all instructions in the database. Please note that instructions
// that have different operands but the same name (or different encodings) will
// appear multiple times as specified in the x86/x64 manuals. The `inst` is an
// `asmdb.x86.Instruction` instance.
isa.instructions.forEach(function(inst) {
  console.log(`Instruction '{inst.name}' [${inst.encoding}] ${inst.opcodeString}`);
}, this);

// Iterates over all instructions in the database, but groups instructions having
// the same name. It's similar to `instructions.forEach()`, but instead of providing
// a single instruction each time it provides an array of instructions sharing the
// same name.
isa.forEachGroup(function(name, insts) {
  console.log(`Instruction ${name}`:);
  for (var i = 0; i < insts.length; i++) {
    const inst = insts[i];
    console.log(`  [${inst.encoding}] ${inst.opcodeString}`);
  }
}, this);

// If iterators are not what you want, it's possible to get a list of instructions
// of the same name by using `query()`.
var insts = isa.query("mov");
for (var i = 0; i < insts.length; i++) {
  const inst = insts[i];
  console.log(`  ${inst.name} [${inst.encoding}] ${inst.opcodeString}`);
}

// You can implement your own iterator by using `instruction`, `instructionNames`,
// `instructionMap`, or `query()`:
const names = isa.instructionNames;
for (var i = 0; i < names.length; i++) {
  const name = names[i];
  const insts = x86.query(name);
  // ...
}

The snippet above just shown how to get instructions and list basic properties. What is more interesting is accessing asmdb.x86.Instruction and asmdb.x86.Operand data.

const asmdb = require("asmdb");
const isa = new asmdb.x86.ISA();

// Get some instruction (the first in the group):
const inst = isa.query("vpunpckhbw")[0];
console.log(JSON.stringify(inst, null, 2));

// Iterate over its operands:
const operands = inst.operands;
for (var i = 0; i < operands.length; i++) {
  const operand = operands[i];
  // ...
}

The stringified instruction would print something like this (with added comments that describe the meaning of individual properties):

{
  "name": "vpunpckhbw",            // Instruction name.
  "arch": "ANY",                   // Architecture - ANY, X86, X64.
  "encoding": "RVM",               // Instruction encoding.
  "prefix": "VEX",                 // Prefix - "", "3DNOW", "EVEX", "VEX", "XOP".
  "opcode": "68",                  // A single opcode byte as a hex string, "00-FF".
  "opcodeInt": 104,                // A single opcode byte as an integer (0..255).
  "opcodeString":                  // The whole opcode string, as specified in manual.
    "VEX.NDS.128.66.0F.WIG 68 /r",
  "l": "128",                      // Opcode L field (nothing, 128, 256, 512).
  "w": "WIG",                      // Opcode W field.
  "pp": "66",                      // Opcode PP part.
  "mm": "0F",                      // Opcode MM[MMM] part.
  "vvvv": "NDS",                   // Opcode VVVV part.
  "_67h": false,                   // Instruction requires a size override prefix.
  "rm": "r",                       // Instruction specific payload "/0..7".
  "rmInt": -1,                     // Instruction specific payload as integer (0-7).
  "ri": false,                     // Instruction opcode is combined with register, "XX+r" or "XX+i".
  "rel": 0,                        // Displacement (cb cw cd parts).
  "implicit": false,               // Uses implicit operands (registers / memory).
  "privilege": "L3",               // Privilege level required to execute the instruction.
  "fpu": false,                    // True if this is an FPU instruction.
  "fpuTop": 0,                     // FPU top index manipulation [-1, 0, 1, 2].
  "vsibReg": "",                   // AVX VSIB register type (xmm/ymm/zmm).
  "vsibSize": -1,                  // AVX VSIB register size (32/64).
  "broadcast": false,              // AVX-512 broadcast support.
  "bcstSize": -1,                  // AVX-512 broadcast size.
  "kmask": false,                  // AVX-512 merging {k}.
  "zmask": false,                  // AVX-512 zeroing {kz}, implies {k}.
  "sae": false,                    // AVX-512 suppress all exceptions {sae} support.
  "rnd": false,                    // AVX-512 embedded rounding {er}, implies {sae}.
  "tupleType": "",                 // AVX-512 tuple-type.
  "elementSize": -1,               // Instruction element size (used by broadcast).

  // Extensions required to execute the instruction:
  "extensions": {
    "AVX": true                    // Instruction is an "AVX" instruction.
  },

  // Instruction attributes
  "attributes": {
  },

  // Special registers accessed by the instruction.
  "specialRegisters": {
  },

  // Instruction operands:
  "operands": [{
    "data": "xmm",                 // The operand's data (processed).
    "reg": "xmm",                  // Register operand's definition.
    "regType": "xmm",              // Register operand's type (would differ if reg is "eax" for example).
    "mem": "",                     // Memory operand's definition.
    "memSize": -1,                 // Memory operand's size.
    "memOff": false,               // Memory operand is an absolute offset (only a specific version of MOV).
    "memSeg": "",                  // Segment specified with register that is used to perform a memory IO.
    "vsibReg": "",                 // AVX VSIB register type (xmm/ymm/zmm).
    "vsibSize": -1,                // AVX VSIB register size (32/64).
    "bcstSize": -1,                // AVX-512 broadcast size.
    "imm": 0,                      // Immediate operand's size.
    "immValue": null,              // Immediate value - `null` or `1` (only used by shift/rotate instructions).
    "rel": 0,                      // Relative displacement operand's size.
    "implicit": false,             // True if the operand is an implicit register (not encoded in binary).
    "read": false,                 // True if the operand is a read-op (R or X) from reg/mem.
    "write": true,                 // True if the operand is a write-op (W or X) to reg/mem.
    "rwxIndex": null,              // Read/Write (RWX) index.
    "rwxWidth": null               // Read/Write (RWX) width.
  }, {
    "data": "xmm",                 // ...
    "reg": "xmm",
    "regType": "xmm",
    "mem": "",
    "memSize": -1,
    "memOff": false,
    "memSeg": "",
    "vsibReg": "",
    "vsibSize": -1,
    "bcstSize": -1,
    "imm": 0,
    "immValue": null,
    "rel": 0,
    "implicit": false,
    "read": true,
    "write": false,
    "rwxIndex": -1,
    "rwxWidth": -1
  }, {
    "data": "xmm/m128",
    "reg": "xmm",
    "regType": "xmm",
    "mem": "m128",
    "memSize": 128,
    "memOff": false,
    "memSeg": "",
    "vsibReg": "",
    "vsibSize": -1,
    "bcstSize": -1,
    "imm": 0,
    "immValue": null,
    "rel": 0,
    "implicit": false,
    "read": true,
    "write": false,
    "rwxIndex": -1,
    "rwxWidth": -1
  }]
}

ARM Database

TO BE DOCUMENTED...