dendibakh / perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
2.6k stars 225 forks source link

Question about the data packing lab #100

Open jsjtxietian opened 1 month ago

jsjtxietian commented 1 month ago

Hi thanks for the great lab.

I know that the data packing lab is marked as broken as I can't get the about 20% speed up as mentioned in the video too, however I do get about 3-8% speed up when using clang 17 on windows. Maybe we can investigate further about the current state of this lab ?

dendibakh commented 1 month ago

Hi @jsjtxietian , sure, if you're interested, feel free to investigate. I'm currently very busy, so I won't be able to look into this in the next 1-2 months.

jsjtxietian commented 1 month ago

The following data is collected when N= 50000 and iteration time is 10000, on windows11 using vtune with clang ver 17.0.6 (Note: I can not get reliable opt effect when using the origin N's config)

Running hotspot analysis shows the time saving mainly comes from std::shuffle:

image

Microarchitecture exploration shows a little decrease in backend bound:

image

Something I observe when comparing hardware events: