Closed Chadderz121 closed 2 months ago
Hi, thank you for the observation.
This ITTAGE implementation is for the GEM5 original coupled frontend. It had been implemented for evaluation purpose before the ftb predictor architecture was implemented, which decouples bp from fetch and corresponds well with the RTL version of XiangShan. Now we mainly maintain the predictor implementation under ftb_pred directory. The ITTAGE.cc you mentioned is actually deprecated.
As for the speed issue, the ITTAGE under ftb_pred does use folded history instead of folding from a complete ghr bitset every time. I believe the speed issue does not exist now.
Thank you again for your information! Feel free to open new issues if you find other problems with the new implementation :D
Oh I see, thanks for the info. I hadn't realised the other implementation was being used, how annoying! :woman_facepalming:
I'll migrate my experiment over to the new one then.
Hello. Thank you very much for sharing this project!
I've been using the ITTAGE implementation from this repository as a standalone class in order to run some experiments, and wanted to share some concerns I ran into.
CSRs backwards?
Comparing https://github.com/OpenXiangShan/XiangShan/blob/d9cc7216dbf133c064c961f2a5a6e7acd8598f1d/src/main/scala/xiangshan/frontend/ITTAGE.scala#L176-L181 with https://github.com/OpenXiangShan/GEM5/blob/e62ce212124b229ad1b4bf6e9bac47b80df4dbb6/src/cpu/pred/ITTAGE.cc#L671-L673 It looks to me like CSR1 corresponds with
tag_fh
and CSR2 corresponds withaltTag_fh
; but the implementation ofgetCSR1
andgetCSR2
is the other way around. The implementation ofgetCSR1
computes the same thing asaltTag_fh
? It's a hash, so not terribly important, but it does feel like a mistake to me. I think essentially the implementations of thegetCSR1
andgetCSR2
functions ought to be swapped, then this would be consistent with the Scala.Miss stats
https://github.com/OpenXiangShan/GEM5/blob/e62ce212124b229ad1b4bf6e9bac47b80df4dbb6/src/cpu/pred/ITTAGE.cc#L248-L250
Currently the ITTAGE implementation seems to always return
true
on lookup. Thefalse
case seems commented out. This causes statistics insrc/cpu/pred/bpred_unit.cc
to be inaccurate as it suggests the predictor has 100% hit rate, but a high mispredict rate. In truth the hit rate is low, but the mispredict rate is also low.Performance
The speed of the ITTAGE implementation is quite slow. This may not be a problem for the Gem5 use case, but I was able to get a 100x speedup by computing the folded history directly, rather than maintaining a
ghr
bitset and computing the folded histories on demand. I'd be happy to create a pull request for this if you're interested. The change does reduce code clarity, so perhaps the current approach is preferable?