asmjit / cult

CPU Ultimate Latency Test.
Other
103 stars 15 forks source link
asmjit benchmark cpu jit latency performance x86 x86-64

CULT

CPU Ultimate Latency Test.

Online Access

Introduction

CULT (CPU Ultimate Latency Test) is a tool that runs series of tests that help to estimate how many cycles an X86 processor (both 32-bit or 64-bit modes supported) takes to execute available instructions. The tool should help people that generate code for X86/X64 hardware by allowing them to run tests on their machines themselves instead of relying on information from CPU vendors or third parties that may be incomplete or that may not provide information for all targeted hardware.

The purpose of CULT is to benchmark as many CPUs as possible, to index the results, and to make them searchable, comparable, and accessible online. This information can be then used for various purposes, like statistics about average latencies of certain instructions (like addition, multiplication, and division) of modern CPUs compared to their predecessors, or as a comparison between various CPU generations for people that still write hand-written assembly to optimize certain operations. The output of CULT is JSON for making the results easier to process by third party tools.

Features

TODOs

Building

CULT requires only AsmJit as a dependency, which it expects by default at the same directory level as cult itself. A custom AsmJit directory can be specified with -DASMJIT_DIR=... when invoking cmake. The simplest way to compile cult is by using cmake:

# Clone CULT and AsmJit
$ git clone --depth=1 https://github.com/asmjit/asmjit
$ git clone --depth=1 https://github.com/asmjit/cult

# Create Build Directory
mkdir cult/build
cd cult/build

# Configure and Make
cmake .. -DCMAKE_BUILD_TYPE=Release
make

# Run CULT!
./cult

Command Line Arguments

$ cult [parameters]

CULT Output

CULT outputs information in two formats:

The JSON document has the following structure:

{
  "cult": {
    "version": "X.Y.Z"          // CULT 'major.minor.micro' version.
  },

  // CPU data retrieved by CPUID instruction.
  "cpuData": [
    {
      "level"     : "HEX",      // CPUID:EAX input (main leaf).
      "subleaf"   : "HEX",      // CPUID:ECX input (sub leaf).
      "eax"       : "HEX",      // CPUID:EAX output.
      "ebx"       : "HEX",      // CPUID:EBX output.
      "ecx"       : "HEX",      // CPUID:ECX output.
      "edx"       : "HEX"       // CPUID:EDX output.
    }
    ...
  ],

  // CPU information
  "cpuInfo": {
    "vendorName"  : "String",   // CPU vendor name.
    "vendorString": "String",   // CPU vendor string.
    "brandString" : "String",   // CPU brand string.
    "codename"    : "String",   // CPU code name.
    "modelId"     : "HEX",      // Model ID + Extended Model ID.
    "familyId"    : "HEX",      // Family ID + Extended Family ID.
    "steppingId"  : "HEX"       // Stepping.
  },

  // Array of instructions measured.
  "instructions": [
    {
      "inst"   : "inst x, y"    // Measured instruction and its operands (unique).
      "lat"    : X.YY           // Latency in CPU cycles, including fractions.
      "rcp"    : X.YY           // Reciprocal throughput, including fractions.
    }
    ...
  ]
}

Implementation Notes

Authors & Maintainers