Closed marioroy closed 1 year ago
I did some testing by adding the following lines at the top of gpu.codon
. This file is located in the Codon installation path e.g. .../install/lib/codon/stdlib/
. It turns out the fixed block size is the reason for the slowness compared to running @gpu.kernel
.
_GRID_SIZE, _BLOCK_SIZE = 0, 0
def set_grid_size(size):
global _GRID_SIZE
_GRID_SIZE = size
def set_block_size(size):
global _BLOCK_SIZE
_BLOCK_SIZE = size
In the same file, I changed two lines inside the outline template function (near the end of the file).
def _gpu_loop_outline_template(start, stop, args, instance: Static[int]):
...
MAX_BLOCK = _BLOCK_SIZE if _BLOCK_SIZE else 1024
MAX_GRID = _GRID_SIZE if _GRID_SIZE else 2147483647
...
Finally, I added two lines to my application. Be sure to import gpu
.
gpu.set_grid_dim(gsize)
gpu.set_block_dim(bsize)
Before and after results:
$ pgpusieve 1e9
before 0.191s
after 0.054s
$ pgpusieve 1e10
before 2.644s
after 0.668s
$ pgpusieve 1e11
before 27.962s
after 11.047s
This is not a bug. I understand the reason why @par(gpu=True)
syntax may run slower.
I compared
@par(gpu=True)
vs@gpu.kernel
. The demonstrations live in the examples folder.https://github.com/marioroy/mce-sandbox
The
pgpusieve.codon
example configures the same step size asgpusieve.codon
. The difference is unable to tell Codon the desiredgrid
andblock
options.