issues
search
PaddleJitLab
/
CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
Apache License 2.0
271
stars
29
forks
source link
[Doc] Add Reduce Optimize Method: Unroll Strategy
#16
Closed
AndSonder
closed
10 months ago
AndSonder
commented
10 months ago
给 Reduce Kernel 添加 Unroll 策略
优化手段
运行时间(us)
带宽(GB/s)
加速比
Baseline
3118.4
42.503
~
交错寻址
1904.4
73.522
1.64
解决 bank conflict
1475.2
97.536
2.29
去除 idle 线程
758.38
189.78
4.11
展开最后一个 Warp
484.01
287.25
6.44
完全展开
477.23
291.77
6.53
给 Reduce Kernel 添加 Unroll 策略