Speedup with shared memory

carpentries-incubator / lesson-gpu-programming

GPU Programming with Python and CUDA.

https://carpentries-incubator.github.io/lesson-gpu-programming/

Other

20 stars 12 forks source link

Speedup with shared memory #63

Closed isazi closed 1 year ago

isazi commented 2 years ago

Show how using shared memory helps speed up the histogram in the lesson material.

isazi commented 2 years ago

Looks like on the GPU we tested the code, the histogram is already fast enough without using shared memory.

isazi commented 2 years ago

It would be better to refactor the code to run all the versions one after the other on the same data.

HannoSpreeuw commented 2 years ago

How do you define fast enough? Can you still achieve speedups from using shared memory with large enough histograms?

isazi commented 2 years ago

How do you define fast enough?

Not using shared memory seems faster than using shared memory on Google Colab. I need to investigate if this is because of measurement precision (runtime is very fast anyway), or because of something else.

Can you still achieve speedups from using shared memory with large enough histograms?

I am using 2ˆ20 input arrays, large enough for the CPU to take over a minute.

HannoSpreeuw commented 2 years ago

Wow. I did not expect that. What is the GPU runtime?

isazi commented 2 years ago

Few microseconds.

HannoSpreeuw commented 2 years ago

Okay. Yeah, a longer runtime would be somewhat better, would require even larger input.