OpenXiangShan / GEM5

BSD 3-Clause "New" or "Revised" License
54 stars 21 forks source link

ITTAGE correctness #55

Closed Chadderz121 closed 2 months ago

Chadderz121 commented 2 months ago

Hello. Thank you very much for sharing this project!

I've been using the ITTAGE implementation from this repository as a standalone class in order to run some experiments, and wanted to share some concerns I ran into.

CSRs backwards?

Comparing https://github.com/OpenXiangShan/XiangShan/blob/d9cc7216dbf133c064c961f2a5a6e7acd8598f1d/src/main/scala/xiangshan/frontend/ITTAGE.scala#L176-L181 with https://github.com/OpenXiangShan/GEM5/blob/e62ce212124b229ad1b4bf6e9bac47b80df4dbb6/src/cpu/pred/ITTAGE.cc#L671-L673 It looks to me like CSR1 corresponds with tag_fh and CSR2 corresponds with altTag_fh; but the implementation of getCSR1 and getCSR2 is the other way around. The implementation of getCSR1 computes the same thing as altTag_fh? It's a hash, so not terribly important, but it does feel like a mistake to me. I think essentially the implementations of the getCSR1 and getCSR2 functions ought to be swapped, then this would be consistent with the Scala.

Miss stats

https://github.com/OpenXiangShan/GEM5/blob/e62ce212124b229ad1b4bf6e9bac47b80df4dbb6/src/cpu/pred/ITTAGE.cc#L248-L250

Currently the ITTAGE implementation seems to always return true on lookup. The false case seems commented out. This causes statistics in src/cpu/pred/bpred_unit.cc to be inaccurate as it suggests the predictor has 100% hit rate, but a high mispredict rate. In truth the hit rate is low, but the mispredict rate is also low.

Performance

The speed of the ITTAGE implementation is quite slow. This may not be a problem for the Gem5 use case, but I was able to get a 100x speedup by computing the folded history directly, rather than maintaining a ghr bitset and computing the folded histories on demand. I'd be happy to create a pull request for this if you're interested. The change does reduce code clarity, so perhaps the current approach is preferable?

Lingrui98 commented 2 months ago

Hi, thank you for the observation.

This ITTAGE implementation is for the GEM5 original coupled frontend. It had been implemented for evaluation purpose before the ftb predictor architecture was implemented, which decouples bp from fetch and corresponds well with the RTL version of XiangShan. Now we mainly maintain the predictor implementation under ftb_pred directory. The ITTAGE.cc you mentioned is actually deprecated.

As for the speed issue, the ITTAGE under ftb_pred does use folded history instead of folding from a complete ghr bitset every time. I believe the speed issue does not exist now.

Thank you again for your information! Feel free to open new issues if you find other problems with the new implementation :D

Chadderz121 commented 2 months ago

Oh I see, thanks for the info. I hadn't realised the other implementation was being used, how annoying! :woman_facepalming:

I'll migrate my experiment over to the new one then.