Compare kernel throughput to peak theoretical throughput - Githubissues

dzhoshkun / cuda-learning

BSD 3-Clause "New" or "Revised" License

1 stars 0 forks source link

Compare kernel throughput to peak theoretical throughput #10

Open dzhoshkun opened 6 years ago

dzhoshkun commented 6 years ago

The CUDA performance guidelines state:

comparing the floating-point operation throughput or memory throughput - whichever makes more sense - of a particular kernel to the corresponding peak theoretical throughput of the device indicates how much room for improvement there is for the kernel.

[ ] select a number of typical kernels (for instance from typical CUDA examples)
[ ] choose a GPU to run these on
[ ] measure the corresponding throughput for each kernel
[ ] compare the throughput to the corresponding theoretical peak throughput
[ ] try to increase the throughput by changing the implementation

dzhoshkun commented 6 years ago

Also helpful from the CUDA best practices guide: &

8.2.1 theoretical bandwidth calculation
8.2.2 effective bandwidth calculation