apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.8k stars 3.47k forks source link

[RFC] [VTA] [TSIM] Enabling Cycle-Accurate Hardware Simulation for VTA #3009

Closed vegaluisjose closed 5 years ago

vegaluisjose commented 5 years ago

The following RFC proposes a new simulation environment called TSIM that improves software and hardware integration and simulation accuracy compared to functional simulation. One of the goals of this RFC is integrating the hardware development process into the software stack from the beginning, allowing features to be incrementally implemented and evaluated as workloads evolve over time. Under this environment, the hardware description is the actual specification. This reduces the burden of maintaining consistency between the specification written usually in a higher language such as C/C++ and the actual hardware design described in a language such as Verilog. Moving to TSIM will allow us to have a more fluid hardware-software specification, and invite more contributions to modify different layers of the stack.

Moreover, this integration provides a more accurate performance feedback, i.e. clock cycles, compared to the traditional functional model of a hardware accelerator. This is because TSIM is based on an open-source hardware simulator called Verilator, which compiles Verilog designs down to C++ classes for cycle-accurate simulation.

Lastly, Verilator is already available in many Linux distributions, i.e. Ubuntu, and OSX via homebrew.

Proposed design

TSIM uses Verilator to integrate VTA designs into TVM and provides flexibility in the hardware language used to implement these designs. For example, one could use OpenCL, C/C++ or Chisel3 to describe a VTA design that would eventually be compiled down to Verilog, since it is the standard input language for FPGA/ASIC tools. Additionally, Verilator supports the Direct Programming Interface (DPI), which is part of the Verilog standard and a mechanism to support foreign programming languages.

We leverage these features available in Verilator to interface hardware designs from upper layers in the TVM stack such as drivers, runtime, etc. In fact, we have developed all the glue layers to make this happen, including:

Finally, the following snippet shows how a VTA design simulation, based on the add-by-one example, is invoked on TVM:

ctx = tvm.cpu(0)
a = tvm.nd.array(...) # input
b = tvm.nd.array(...) # output
tsim = tvm.module.load("libtsim.so", "vta-tsim")
f = tvm.get_global_func("tvm.vta.driver")
f(tsim, a, b)
tmoreau89 commented 5 years ago

This will facilitate continuous testing of novel VTA features, and invite more contribution to modify the VTA spec in the future. Cycle-accurate testing from high-level test scripts in Python is definitely the way forward for all of TVM's backend.

@kazum @ktabata it would be great to get your take on this since you took part in providing the SDAccel and AOCL support respectively in TVM. Having your take on this verilator-based simulation flow would be great.

vegaluisjose commented 5 years ago

@jroesch

There are two reasons why we have the two DPI modules in Verilog (Host and Memory):

1) To support either handwritten Verilog accelerators or generated Verilog from other languages different than Chisel3

2) Chisel3 does not support DPI, which is the "CFFI" of Verilog. However, Chisel3 does support Verilog inlining which is what we use for this so we don't duplicate code, see here

tqchen commented 5 years ago

https://github.com/dmlc/tvm/pull/3010