Open takhandipu opened 1 month ago
There are multiple ways to implement 2-level BTB inside scarab. Here, I am describing one such way. Please feel free to implement the way you see fit.
This line (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp.h#L143) declares a BTB. Instead of having a single BTB cache, you can define 2 BTBs as (Cache btb1; Cache btb2;
). Next, when you initiate your BTB in line (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp_targ_mech.c#L271), you initiate both btb1
and btb2
as follows:
init_cache(&bp_data->btb1, "BTB1", BTB_ENTRIES, BTB_ASSOC, 1, sizeof(Addr), REPL_TRUE_LRU);
init_cache(&bp_data->btb2, "BTB2", 6*BTB_ENTRIES, BTB_ASSOC, 1, sizeof(Addr), REPL_TRUE_LRU); // assuming 4K BTB entries, we are following IBM's (http://class.ece.iastate.edu/tyagi/cpre581/papers/HPCA13BulkPreloadBranch.pdf) 6 times size.
Next, you will also have to update the prediction (bp_btb_gen_pred
) and update (bp_btb_gen_update
) functions for your second-level BTBs. I am providing a simple example of the prediction function as follows:
Addr* bp_btb_gen_pred(Bp_Data* bp_data, Op* op) {
Addr line_addr;
if (PERFECT_BTB){return op->oracle_info.target};
Addr* btb_result = (Addr*)cache_access(&bp_data->btb1, op->oracle_info.pred_addr, &line_addr, TRUE);
if (btb_result==NULL) { // Not present in BTB1 so we should also look at BTB2
btb_result = (Addr*)cache_access(&bp_data->btb2, op->oracle_info.pred_addr, &line_addr, TRUE);
}
return btb_result;
}
You can also use a similar mechanism for the update function. Please note that, in this simple example, you are assuming instantaneous access to BTB1 and BTB2 (i.e., there is no delay in accessing BTB1 and BTB2, also known as 0-cycle bubble in the literature). Once you have implemented a good enough version of 2-level BTB, we will make it more realistic to introduce delay.
Similarly, we can create a simple update function for the 2-level BTB (https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp_targ_mech.c#L294) as follows:
void bp_btb_gen_update(Bp_Data* bp_data, Op* op) {
Addr fetch_addr = op->oracle_info.pred_addr;
Addr *btb_line, *btb_line, btb_line_addr, repl_line_addr;
ASSERT(bp_data->proc_id, bp_data->proc_id == op->proc_id);
if(BTB_OFF_PATH_WRITES || !op->off_path) {
DEBUG_BTB(bp_data->proc_id, "Writing BTB addr:0x%s target:0x%s\n",
hexstr64s(fetch_addr), hexstr64s(op->oracle_info.target));
STAT_EVENT(op->proc_id, BTB_ON_PATH_WRITE + op->off_path);
btb_line = (Addr*)cache_access(&bp_data->btb1, fetch_addr, &btb_line_addr,
TRUE);
//btb_line = (Addr*)cache_access(&bp_data->btb2, fetch_addr, &btb_line_addr,
TRUE);
if (btb_line == NULL) {
btb_line = (Addr*)cache_access(&bp_data->btb2, fetch_addr, &btb_line_addr,
TRUE);
if (btb_line == NULL) {
// Not present in either, so put it into btb2
btb_line = (Addr*)cache_insert(&bp_data->btb2, bp_data->proc_id, fetch_addr,
&btb_line_addr, &repl_line_addr);
} else {
// already present in btb2, so let's also insert it into btb1
btb_line = (Addr*)cache_insert(&bp_data->btb1, bp_data->proc_id, fetch_addr,
&btb_line_addr, &repl_line_addr);
}
}
*btb_line = op->oracle_info.target;
// FIXME: the exceptions to this assert are really about x86 vs Alpha
ASSERT(bp_data->proc_id, (fetch_addr == btb_line_addr) || TRUE);
}
}
The goal of this feature request would be to implement a multi-level branch target buffer. The first step of the solution would be to implement a basic multi-level BTB described in https://ieeexplore.ieee.org/abstract/document/214687.
A good place to start the implementation would be here: https://github.com/Litz-Lab/scarab/blob/174cd06df64486e564ff75ede37029247ba48a37/src/bp/bp.h#L138. Please feel free to reach out to me, @kofyou, or @hlitz if you are interested in implementing this feature.