PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.
Apache License 2.0
86 stars 16 forks source link

[Doc] Add Reduce Optimize Method: Interleaved Addressing #7

Closed AndSonder closed 5 months ago

AndSonder commented 6 months ago

Add Reduce Interleaved Addressing 交叉寻址优化

优化手段 运行时间(us) 带宽 加速比
Baseline 3118.4 42.503GB/s ~
交错寻址 1904.4 73.522GB/s 1.64