Branch predictor implementation

jiristefan commented 6 months ago

Resolves #64 Should also address the branch predictor TODOs from #1 and #2

This PR implements:

Branch Target Table (BTB) for storing instruction address / target address pairs for jump and branch instruction, along with a UI dock widgets for visualizing the table
3 simple static branch predictors (Always Taken / Not Taken, Backward Taken Forward Not Taken)
3 Smith predictors (1-bit, 2-bit, 2-bit with hysteresis), along with a Branch History Table (BHT)
Variable BTB and BHT sizes, from 0 to 8 bits (0 to 16 bits for BHT, might decrease to 8 in the future)
UI dock widget for visualizing the BHT contents, predictions, and predictor statistics
Optional Branch History Register for addressing the BHT

Predictor info widget with BHT: bht-prediction

BTB widget: btb-prediction

Configuration dialog: qtrvsim-newdialog

ppisa commented 6 months ago

@jiristefan please focus on the thesis now and then follow @jdupak suggestions. When you push to your branch the pull request should automatically update as well.

I have tested next code on your branch 233b331d5c40183a7abd2ebc17e94af35a348d12 with sequence tuned to have no aliases and results seems to be correct

   addi a0, zero, 10
l: nop
   j a
   nop
a: nop
   bne zero, zero, b
   nop
   nop
b: beq zero, zero, c
   nop
c: addi a0, a0, -1
   bne a0, zero, l
   ebreak

with Smith 1 bit BTB bits 3, BHR 0, BHT 2, initial NT and results are strange on single-cycle and pipelined and results seems to be correct.

When I add BHR 2, I get correct 35, incorrect 5 accuracy 87%.

But when I switch to pipelined I get correct 28, incorrect 12 accuracy 70%.

Adding BHR support above the assignment you have complicated things for yourselves. It is nice to have it there, but you probably need to pass through pipeline the index of the entry used at the prediction time because BHR updates when some branches are near. Even this solution would have problems depending on the variable length stalls caused by possible different cache hit miss in the sequences. Using local history would be easier there probably a little. But it can be added in future to the selection of the Smitch etc...

I do not see there problems with global BHR as blocker, it should be documented that for precise expected behavior used in the textbooks the core has to be used in the single-cycle variant. But some analysis how to resolve problem for pipelined and even for superscalar cores should be considered to update behavior in future.

jiristefan commented 6 months ago

Thank you for the review. I will fix the highlighted issues during next week, I admit the frontend code is more messy than the internal implementation, I'll try to implement all the suggestions and generally clean it up.

As for the branch history register, I will probably use the suggested approach of storing the index with the instruction address locally in the predictor, I believe it should not take too much time to implement properly.

ppisa commented 6 months ago

@jiristefan I have solution to make BHR work correctly for pipelined version. You need two copies of BHR. The one used to form index at PC computation stage is updated from outcomes of predictions and then the second one is updated in MEM stage from actual results of the branch condition evaluations. They will be in delayed sync which will be maintained as long as there is no missprediction. At the flush event MEM stage BHR will be copied to PC stage BHR. This way the predictor should work exactly the same way/with same outcome for singlecycle and pipelined version. So when you submit the text you can try to implement this approach.

jiristefan commented 6 months ago

@ppisa That sounds like the simplest implementation: BHR is already implemented as a class, so a simple solution would be just to create another instance and update and use that one in the prediction function. Then I'll only add another member function of the predictor, which will copy over the values during flush as you said. Thank you for the idea.

jiristefan commented 6 months ago

I should mention that I moved the contents of the "Core" tab in the dialog window for configuring the simulation into a scroll area widget:

The branch predictor config added a bit of height to the tab, and there are couple more settings I would like to add to the tab regarding the predictor in the future. So, to prevent the window from getting too tall, I moved everything to the scroll area. Hopefully there will be no issue with this.

jiristefan commented 6 months ago

@jdupak I think the reason the Qt6 check is failing is probably because of the predictor_types.h file which I added with the predictor enums I use in the code. To use the enums inside QVariant in the UI objects, I had to register the enum using the provided macros Q_NAMESPACE and Q_ENUM_NS.

I got a very similar error with Qt5 when I tried to compile the code after that, before noticing I did not add the predictor_types.h to the CMakeLists.txt file, then it worked fine.

I'm not sure what changed in Qt6, or if this is a good way to handle this, so please let me know how to fix this. An alternative would be to just convert the enums to integers before storing them in the QVariant, and then back when reading them.

ppisa commented 4 months ago

The branch prediction demonstration developed by Jiri Stefan in the frame of his master's thesis is welcomed step for QtRvSim to cover more computer architectures lectures topics. The visualization and correct/wrong prediction statistic counting matches teaching needs of the classes when demonstrated on single-cycle processor. For pipelined version there are more topics to discussion, see https://github.com/cvut/qtrvsim/issues/143. It is questionable if solution matching basic textbooks principles on the pipelined version without classification of branched during cache fills can be found. Probably not, but some updates would provide better insight to the problem and code should be enhanced.

cvut / qtrvsim

Branch predictor implementation #135