Open jsjtxietian opened 1 month ago
Hi @jsjtxietian , sure, if you're interested, feel free to investigate. I'm currently very busy, so I won't be able to look into this in the next 1-2 months.
The following data is collected when N= 50000 and iteration time is 10000, on windows11 using vtune with clang ver 17.0.6 (Note: I can not get reliable opt effect when using the origin N's config)
Running hotspot analysis shows the time saving mainly comes from std::shuffle
:
Microarchitecture exploration shows a little decrease in backend bound:
Something I observe when comparing hardware events:
Hi thanks for the great lab.
I know that the data packing lab is marked as broken as I can't get the about 20% speed up as mentioned in the video too, however I do get about 3-8% speed up when using clang 17 on windows. Maybe we can investigate further about the current state of this lab ?