Closed r-mafi closed 3 years ago
this change requires also changing data structures used to optimize read/write, to be specific SeqT4
and ScoreT4
, and the corresponding kernels that work with these data structures. Also in the general case of long reads, going under 128 band-width can introduce alignment errors. therefore, closing this issue.
In the current implementation, different variations of Needleman-Wunch kernels process score matrix row by row. Each thread processes a fixed number of cells per row at a time,
CELLS_PER_THREAD = 4
. To make this number variable allows adjusting parallelism granularity (e.g. right now, minimum band-width length =CELLS_PER_THREAD*WARP_SIZE
) and potentially improving performance.