PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.
Apache License 2.0
86 stars 16 forks source link

[Doc] Add Reduce Optimize Method: remove idle threads #15

Closed AndSonder closed 5 months ago

AndSonder commented 6 months ago
优化手段 运行时间(us) 带宽(GB/s) 加速比
Baseline 3118.4 42.503 ~
交错寻址 1904.4 73.522 1.64
解决 bank conflict 1475.2 97.536 2.29
去除 idle 线程 758.38 189.78 4.11