EPFL-LAP / dynamatic

DHLS (Dynamic High-Level Synthesis) compiler based on MLIR
Other
49 stars 14 forks source link

[handshake-simulator] Design proposal and tracking issue #94

Open lucas-rami opened 2 months ago

lucas-rami commented 2 months ago

This is a design proposal and tracking issue for the implementation of the Handshake-level dataflow circuit simulator, or Handshake simulator for short. While we already have an experimental version of this simulator on the repository, the design/implementation effort required to get it to a correct and usable state is almost equivalent to a full rewrite, therefore this issue assumes that we start from scratch.

Goal & Requirements

The goal is to give us the ability to simulate dataflow circuits from the Handshake-level IR alone, without going to RTL through the backend and then simulating the resulting circuit with a tool like ModelSim. The former offers several benefits over the latter (in no particular order).

In order to ensure the simulator's usefulness to the community, we establish a few key design requirements.

For completeness, we also list some of the features that we do not care to have in the simulator, at least at the moment.

Implementation roadmap

In this section we try to break down the implementation of the simulator from scratch into multiple manageable steps. This will most likely be edited a lot throughout the simulator's implementation. Issues, PRs, and commits that relate to each point will be referenced here. To keep individual contributions manageable, each pull request must cover at most the content of one of the subsections below. Smaller pull requests for sub-tasks in each subsection are allowed and encouraged if they make sense from a development perspective.

Execution models

This is likely the step that requires the most careful thought and the most design effort as it is what Dynamatic users are the most likely to interact with.

I really think it is key that we get this right and make the life of the future user as easy as possible. I have some syntax is mind that I think will be very nice to work with, it is inspired by the way MLIR handles rewrite patterns. Below is some sample C++-like pseudocode for what I envision.

/// Abstract parent class for execution models, templated
/// by the operation type which it simulates and the type
/// of a data-structure the component uses to maintain its
/// internal state (and which may be `void`).
template <typename Op, typename State>
class ExecutionModel {
  /// May be called by concrete execution models to
  /// determine whether we are on a rising edge of the clock.
  final bool isClockRisingEdge() { ... }

  /// The execution function that simulates the operation.
  virtual void exec(
    Op op /* the simulated MLIR operation */,
    State &state /* the operation's current internal state*/,
    InputReader &reader /* a way to query for the state of the operation's operands */,
    OutputWriter &writer /* a way to modify the state of the operation's outputs */
  ) = delete;
}

/// Example state and model for a multiplexer.
class MuxState { ... };
class MuxModel : public ExecutionModel<handshake::MuxOp, MuxState> {
  void exec(Op op, State &state, InputReader &reader, OutputWriter &writer) { ... }
}

This time, some pseudocode inspired by how one creates operations in MLIR.

// The Handshake function to simulate
handshahe::FuncOp funcOp = ...;

HandshakeSimulator simulator(funcOp);
for (Operation& op : funcOp.getOps()) {
  // In reality we would check for all supported operation types
  // with llvm::TypeSwitch most likely
  if (auto muxOp = dyn_cast<handshake::MuxOp>(op)) {
    // Constructs an instance of MuxModel which will be associated
    // to muxOp during simulation of funcOp
    simulator.registerModel<MuxModel>(muxOp);
  }
}

Event-driven simulation loop

Once we have execution models, we can implement the simulator's core, in charge of invoking them as needed throughout the circuit's execution to simulate all of the combinational and sequential logic.

We should now be able to simulate using a simple API like the following.

// As initialized in the subsection above
HandshakeSimulator simulator = ...; 

// <register simulation listeners here> (see subsection below)

SimulationArguments simArgs = ...;
SimulationResutls simResults = ...;
simulator.simulate(simArgs... /*function arguments*/, &simResults /*function results */);

Instrumentation capabilities

As mentioned, one of the Handshake simulator's key benefits is the ability to extract information related to the circuit's state throughout simulation.

Here are a couple attempts at defining hooks for things that a user may care about.

// Trigger the callback on each rising edge of the clock cycle
simulator.onClockRisingEdge([&](const CircuitState& state) { ... });

// ----

// A value in the Handshake function being simulated
Value someValue = ...;
// Trigger the callback whenever the state associated to the SSA value changes
simulator.onStateChange(someValue, [&](ValueState oldState, ValueState newState) { ... });

// ----

// A mux in the Handshake function being simulated
handshake::MuxOp muxOp = ...;
// Trigger the callback whenever the state of any of the operation's results changes
using OperandValues = DenseMap<OpOperand*, ValueState>;
using ResultValues = DenseMap<OpResult, ValueState>;
simulator.onStateChange(muxOp, [&](ResultValues oldOutputs, OperandValues newInputs, ResultValues newOutputs { ... });

Writing all execution models

Finally, we will need to go through the cumbersome task of implementing execution models for all our dataflow components.

paolo-ienne commented 1 month ago

I confess that I have not managed to read the whole of this. Yet I would like to add a couple of comments (not necessarily in contradiction with the above): (1) I am not convinced of the usefulness or of the convenience for users to model the behaviour of components at higher levels than RTL--because this will always be in addition to RTL modelling and the consistence of the two models tricky. (2) RTL-level components do not seem exaggeratedly complex compared to other modelling except for math operations--in fact, pretty straightforward. Maybe some fast RTL simulators may be able to leverage a high-level description of math operations without gate-level implementations. (3) Fast RTL-level simulators have similar goals as this project (zero delay, binary simulation) and Verilator seems pretty strong in this arena; maybe we should think this project more in the shape of "how can we make the best use of Verilator for high-speed simulation of our dataflow circuits".

Jiahui17 commented 1 month ago

I think this sort of simulation does more or less the same thing as SystemC (Andrea mentioned the concern after the student presented the first version of the simulator).

https://github.com/accellera-official/systemc

So why not create an export-sc tool and simulate everything in SystemC?