NVIDIA-Genomics-Research / GenomeWorks

SDK for GPU accelerated genome assembly and analysis
https://clara-parabricks.github.io/GenomeWorks/
Apache License 2.0
286 stars 76 forks source link

[cudapoa] customizable number of cells per thread in NW kernels #494

Closed r-mafi closed 3 years ago

r-mafi commented 4 years ago

In the current implementation, different variations of Needleman-Wunch kernels process score matrix row by row. Each thread processes a fixed number of cells per row at a time, CELLS_PER_THREAD = 4. To make this number variable allows adjusting parallelism granularity (e.g. right now, minimum band-width length = CELLS_PER_THREAD*WARP_SIZE) and potentially improving performance.

r-mafi commented 3 years ago

this change requires also changing data structures used to optimize read/write, to be specific SeqT4 and ScoreT4, and the corresponding kernels that work with these data structures. Also in the general case of long reads, going under 128 band-width can introduce alignment errors. therefore, closing this issue.