bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
221 stars 58 forks source link

Jacobi #641

Closed yodada closed 2 years ago

yodada commented 2 years ago

This PR merges Jacobi kernel code, which can be found at software/spmd/bsg_cuda_lite_runtime/jacobi/

Jacobi 3D takes an input of Nx * Ny * Nz. This implementation is unrolled along Nx, and distributes Ny and Nz along tileX and tileY respectively. So the minimal valid input is 64 * 18 * 10. Another valid input is 126 * 18 *10. Note there along Nx it reads 64 inputs and generates 62 outputs. There is overlapping between steps.

drichmond commented 2 years ago

Merged kernel code into https://github.com/bespoke-silicon-group/bsg_replicant/pull/779