Closed isazi closed 1 year ago
Looks like on the GPU we tested the code, the histogram is already fast enough without using shared memory.
It would be better to refactor the code to run all the versions one after the other on the same data.
How do you define fast enough? Can you still achieve speedups from using shared memory with large enough histograms?
How do you define fast enough?
Not using shared memory seems faster than using shared memory on Google Colab. I need to investigate if this is because of measurement precision (runtime is very fast anyway), or because of something else.
Can you still achieve speedups from using shared memory with large enough histograms?
I am using 2ˆ20 input arrays, large enough for the CPU to take over a minute.
Wow. I did not expect that. What is the GPU runtime?
Few microseconds.
Okay. Yeah, a longer runtime would be somewhat better, would require even larger input.
Show how using shared memory helps speed up the histogram in the lesson material.