Litz-Lab / scarab

MIT License
7 stars 6 forks source link

[FEATURE] Implement Multi-Level Branch Target Buffer #4

Open takhandipu opened 1 month ago

takhandipu commented 1 month ago

The goal of this feature request would be to implement a multi-level branch target buffer. The first step of the solution would be to implement a basic multi-level BTB described in https://ieeexplore.ieee.org/abstract/document/214687.

image image image image image

A good place to start the implementation would be here: https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp.h#L138. Please feel free to reach out to me, @kofyou, or @hlitz if you are interested in implementing this feature.

takhandipu commented 1 month ago

There are multiple ways to implement 2-level BTB inside scarab. Here, I am describing one such way. Please feel free to implement the way you see fit.

This line (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp.h#L143) declares a BTB. Instead of having a single BTB cache, you can define 2 BTBs as (Cache btb1; Cache btb2;). Next, when you initiate your BTB in line (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp_targ_mech.c#L271), you initiate both btb1 and btb2 as follows:

init_cache(&bp_data->btb1, "BTB1", BTB_ENTRIES, BTB_ASSOC, 1, sizeof(Addr),  REPL_TRUE_LRU);
init_cache(&bp_data->btb2, "BTB2", 6*BTB_ENTRIES, BTB_ASSOC, 1, sizeof(Addr),  REPL_TRUE_LRU); // assuming 4K BTB entries, we are following IBM's (http://class.ece.iastate.edu/tyagi/cpre581/papers/HPCA13BulkPreloadBranch.pdf) 6 times size.

Next, you will also have to update the prediction (bp_btb_gen_pred) and update (bp_btb_gen_update) functions for your second-level BTBs. I am providing a simple example of the prediction function as follows:

Addr* bp_btb_gen_pred(Bp_Data* bp_data, Op* op) {
  Addr line_addr;
  if (PERFECT_BTB){return op->oracle_info.target};
  Addr* btb_result =  (Addr*)cache_access(&bp_data->btb1, op->oracle_info.pred_addr, &line_addr, TRUE);
  if (btb_result==NULL) { // Not present in BTB1 so we should also look at BTB2
    btb_result = (Addr*)cache_access(&bp_data->btb2, op->oracle_info.pred_addr, &line_addr, TRUE);
  }
  return btb_result;
}

You can also use a similar mechanism for the update function. Please note that, in this simple example, you are assuming instantaneous access to BTB1 and BTB2 (i.e., there is no delay in accessing BTB1 and BTB2, also known as 0-cycle bubble in the literature). Once you have implemented a good enough version of 2-level BTB, we will make it more realistic to introduce delay.

takhandipu commented 1 month ago

Similarly, we can create a simple update function for the 2-level BTB (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp_targ_mech.c#L294) as follows:

void bp_btb_gen_update(Bp_Data* bp_data, Op* op) {
  Addr  fetch_addr = op->oracle_info.pred_addr;
  Addr *btb_line, *btb_line, btb_line_addr, repl_line_addr;

  ASSERT(bp_data->proc_id, bp_data->proc_id == op->proc_id);
  if(BTB_OFF_PATH_WRITES || !op->off_path) {
    DEBUG_BTB(bp_data->proc_id, "Writing BTB  addr:0x%s  target:0x%s\n",
              hexstr64s(fetch_addr), hexstr64s(op->oracle_info.target));
    STAT_EVENT(op->proc_id, BTB_ON_PATH_WRITE + op->off_path);

    btb_line = (Addr*)cache_access(&bp_data->btb1, fetch_addr, &btb_line_addr,
                                   TRUE);

    //btb_line = (Addr*)cache_access(&bp_data->btb2, fetch_addr, &btb_line_addr,
                                   TRUE);
    if (btb_line == NULL) {

      btb_line = (Addr*)cache_access(&bp_data->btb2, fetch_addr, &btb_line_addr,
                                   TRUE);
      if (btb_line == NULL) {
        // Not present in either, so put it into btb2
        btb_line  = (Addr*)cache_insert(&bp_data->btb2, bp_data->proc_id, fetch_addr,
                                      &btb_line_addr, &repl_line_addr);
      } else {
        // already present in btb2, so let's also insert it into btb1
        btb_line  = (Addr*)cache_insert(&bp_data->btb1, bp_data->proc_id, fetch_addr,
                                      &btb_line_addr, &repl_line_addr);
      }

    }
    *btb_line = op->oracle_info.target;
    // FIXME: the exceptions to this assert are really about x86 vs Alpha
    ASSERT(bp_data->proc_id, (fetch_addr == btb_line_addr) || TRUE);
  }
}