PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.
Apache License 2.0
86 stars 16 forks source link

[Doc] Add reduce optimize method: remove bank conflict #8

Closed AndSonder closed 5 months ago

AndSonder commented 6 months ago

Add reduce optimize method: remove bank conflict

优化手段 运行时间(us) 带宽 加速比
Baseline 3118.4 42.503GB/s ~
交错寻址 1904.4 73.522GB/s 1.64
解决 bank conflict 1475.2 97.536GB/s 2.29
AndSonder commented 6 months ago

@Aurelius84 又更新了俩篇,还麻烦有空的时候帮忙review一下~