andikleen / pmu-tools

Intel PMU profiling tools
GNU General Public License v2.0
1.98k stars 331 forks source link

Question on Slots_Estimated Domain #452

Open Jerry-Tianchen opened 1 year ago

Jerry-Tianchen commented 1 year ago

Hi, I am reviewing the calculation behind DSB and MITE, I realized that in TMA 4.2 you move these two metrics into the Slots_Estimated domain. May I ask how should we interpret this "Slots_Estimated" domain? I really can't understand the meaning behind dividing 2 here (MITE for example):

(EV("IDQ.MITE_CYCLES_ANY", 3) - EV("IDQ.MITE_CYCLES_OK", 3)) / CORE_CLKS(self, EV, 3) / 2

I am reviewing the calculation because I am confused about why Fetch_Bandiwidth.DSB + Fetch_Bandiwidth.MITE don't equal to Fetch_Bandiwidth.....

I am running on sapphire_rapids and here is a screenshot for my analysis if you need it. image

Thank you so much~

andikleen commented 1 year ago

On Sat, Apr 15, 2023 at 06:31:19PM -0700, Jerry Tianchen wrote:

Hi, I am reviewing the calculation behind DSB and MITE, I realized that in TMA 4.2 you move these two metrics into the Slots_Estimated domain. May I ask how should we interpret this "Slots_Estimated" domain? I really can't understand the meaning behind dividing 2 here (MITE for example):

(EV("IDQ.MITE_CYCLES_ANY", 3) - EV("IDQ.MITE_CYCLES_OK", 3)) / CORE_CLKS(self, EV, 3) / 2

I am reviewing the calculation because I am confused about why Fetch_Bandiwidth.DSB + Fetch_Bandiwidth.MITE don't equal to Fetch_Bandiwidth.....

I think it's just a crude heuristic, assuming that half the non perfect MITE cycles are due to bandwidth. That's why it's "estimated". You cannot necessarily assume it matches up with other metrics, and your mileage will depend on your workload.

aayasin commented 1 year ago

There is good statistical reason behind the /2 here. I’d like to investigate this.

aayasin commented 1 year ago

@Jerry-Tianchen I suspect something is wrong in the measurement based on the screenshot you provided. Please they run with the latest PMU tools to see if the issue persists (it has TMA version 4.6). I’d appreciate a reproducer as well if that is possible, e.g. micro benchmark. Otherwise, please share some details on the workload itself