issues
search
PaddleJitLab
/
CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
Apache License 2.0
86
stars
16
forks
source link
[Doc] Add Reduce Optimize Method: remove idle threads
#15
Closed
AndSonder
closed
5 months ago
AndSonder
commented
6 months ago
优化手段
运行时间(us)
带宽(GB/s)
加速比
Baseline
3118.4
42.503
~
交错寻址
1904.4
73.522
1.64
解决 bank conflict
1475.2
97.536
2.29
去除 idle 线程
758.38
189.78
4.11