EPFL-LAP / dynamatic

DHLS (Dynamic High-Level Synthesis) compiler based on MLIR
Other
60 stars 18 forks source link

[LSQ] FPT'19 flow analysis to minimize LSQ usage #56

Closed lucas-rami closed 9 months ago

lucas-rami commented 10 months ago

This is a design proposal for the re-implementation of LSQ flow analysis in Dynamatic (initially introduced in this paper published at FPT'19). The pass aims to minimize the size of LSQs by identifying, for each memory interface, load operations that all store operations to the same memory interface depend on. In this context, "dependency" is understood as a form of "control flow dependency" i.e., a dependence relation between the data result of a load with the address or data operand of a store.

Load operations which can be shown to provide the data value necessary to determine the address or data operand of all stores to the same memory interface may be excluded from the list of accesses needing to go through an LSQ if additional easy-to-check conditions are met (outlined in section IV.D of the paper).

While legacy Dynamatic implemented this analysis to run before the elastic pass (i.e., dataflow circuit conversion, our current StandardToHandshake pass), we propose here to implement it to run after the elastic pass on Handshake-level IR, as an independent pass. The change's rational is to allow for the identification of dependencies that are not visible at higher IR levels but become visible in the corresponding dataflow circuits.

We show below some (C++-influenced) pseudo-code for the identification of GIID (global in-order dependence) relationships between stores and loads to the same memory interface, which is the core part of the flow analysis. The isGIID function considers a specific LSQ load port and the full list of store ports to the same LSQ, and determines whether the latter are all globally in-order dependent (either through their address or data operand) on the load port's data result.

/// Recursive function that determines whether the stValue's producer is
/// globally in-order dependent on the ldData value, which should be the data
/// result of a load operation. In other words, the function answers the
/// following question: is ldData always (i.e., independently of the path
/// through the DFG) involved in the determination of stValue?
/// 
/// The function achieves this by "bactracking" through the dataflow circuit
/// (from operations' results to operands) in search for the ldData value. The
/// backtracking behavior is influenced by the type of the operation traversed
/// at each step. As it backtracks, the function keeps track of the operations
/// it has crossed on its path to be able to break cycles in the DFG.  
bool originatesFromLoad(Value ldData, Value stValue, mlir::DenseSet<Operation*> path = {}) {
  if (ldData == stValue) {
    // We have successfully backtracked to the loaded data 
    return true;
  }
  Operation* defOp = stValue.getDefiningOp();
  if (!defOp) {
    // We have reached function arguments, which cannot depend on the loaded data
    return false;
  }
  if (auto [_, newOp] = path.insert(defOp) ; !newOp) {
    // If we are encountering an operation for the second time it means that we
    // went through an entire CFG cycle, which implies that there was a
    // merge-like operation on our path that is computing the conjunction of
    // this function's results on all its data inputs. In that case, the merge
    // inputs coming from outside the cycle determine whether the entire cycle
    // depends on the loaded data, so we return true to not falsify the
    // conjunction. If merge input coming from outside the cycle do not depend
    // on the loaded data, the function's top-level call will still return false 
    return true;
  }

  if (auto condBrOp = dyn_cast<handhsake::ConditionalBranchOp>(defOp)) {
    // The data operand or the condition operand must depend on the loaded data
    return originatesFromLoad(ldData, condBrOp.getDataOperand(), clone(path)) || 
           originatesFromLoad(ldData, condBrOp.getCondOperand(), clone(path));
  }
  else if (isa<arith::AddIOp, arith::SubIOp, ...>(defOp)) {
    // The LHS or RHS must depend on the loaded data
    return originatesFromLoad(ldData, defOp.getOperand(0), clone(path)) || 
           originatesFromLoad(ldData, defOp.getOperand(1), clone(path));
  }
  else if (isa<handhsake::MergeOp, handhsake::ControlMergeOp>(defOp)) {
    // All data inputs must depend on the loaded data
    for (Value mergeLikeOprd : defOp->getOperands()) {
      if (!originatesFromLoad(ldData, mergeLikeOprd, clone(path)))
        return false;
    }
  } else if (auto muxOp = dyn_cast<handhsake::MuxOp>(defOp)) {
    // If the select operand depends on the loaded data, then the mux depends on
    // the laoded data
    if (originatesFromLoad(ldData, muxOp.getSelectOperand(), path))
      return true;
    // If the select operand doesn't depend on the laoded data, all data inputs
    // must depend on the loaded data
    for (Value mergeLikeOprd : muxOp.getDataInputs()) {
      if (!originatesFromLoad(ldData, mergeLikeOprd, clone(path)))
        return false;
    }
  } else if ...
  // TODO: define the logic for all existing operations
}

/// Checks whether all store operations are globally in-order dependent on the
/// load operation.
bool isGIID(handshake::LSQLoadOp loadOp, SmallVector<handshake::LSQStoreOp> storeOps) {
  Value ldData = loadOp.getDataResult();
  return llvm::all_of(storeOps, [&](handshake::LSQStoreOp storeOp) {
    return originatesFromLoad(ldData, storeOp.getDataOperand()) || 
           originatesFromLoad(ldData, storeOp.getAddrOperand());
  });
}