CTSRD-CHERI / SIMTight

Synthesisable SIMT-style RISC-V GPGPU
28 stars 8 forks source link

Simple Blocked 2D Stencil #5

Closed paulmetzger closed 3 years ago

paulmetzger commented 3 years ago

Hi @mn416 , this is the optimised implementation of the stencil code that uses one thread per output element. I called it "blocked" because each group of 64 warps computes a block of the output buffer. It also accesses global memory in an aligned way. I am happy to rename it or make changes.

mn416 commented 3 years ago

Looks good @paulmetzger. Could you update test/test.sh to run this new sample as part of the test suite?

paulmetzger commented 3 years ago

@mn416 great, I implemented all suggestions.