over time, numerous versions of Needleman-Wunsch algorithm for different banding modes have been developed. In this PR, we unify adaptive-banded and static-banded versions into a same device kernel.
other changes:
added adaptive NW with traceback buffer.
updated BandMode to include adaptive_band_traceback
due to larger register usage in adaptive_band_traceback, setting register count to 64 results in large memory spills and reduces performance. Therefore, traceback NW kernels are called with a different launch bounds settings to limit registers to 72.
over time, numerous versions of Needleman-Wunsch algorithm for different banding modes have been developed. In this PR, we unify adaptive-banded and static-banded versions into a same device kernel. other changes:
adaptive_band_traceback
adaptive_band_traceback
, setting register count to 64 results in large memory spills and reduces performance. Therefore, traceback NW kernels are called with a different launch bounds settings to limit registers to 72.