fix hisat2 performance: remove unused SpliceSiteDB from AlnSink

improve perf scaling on large inputs (>1M reads) by not updating an unused sorted db for every align output

~~as far as i can tell, the SpliceSiteDB on AlnSink is written but never read from.~~ a similar record keeping structure exists on AlnSinkWrap, which may be in use.

SpliceSiteDB is updated for every aligned read reported to SAM (an output which is always generated?). because SpliceSiteDB is backed by a red-black tree, every update is a log(n) tree traversal with a possible rearrangement... for large numbers of reads, maintaining the structure starts to show an obvious cost.

hisat2 version 2.2.1 from bioconda for subsets of SRR12815479 on VectorBase-68_AaegyptiLVP_AGWG reference, on 8 cores, i observed the following times:

1,250 reads finished in 3.59s (0.3kread/s) 25,000 reads finished in 4.173s (5.9kread/s) 250,000 reads finished in 12.968s (19.2kread/s) 1,000,000 reads finished in 44s (22.6kread/s) 3,000,000 reads finished in 237s (12.6kread/s) 12,000,000 reads ran for over 3480+s (<3.5kread/s) before i gave up.

after this patch, i observe: 25,000 reads in 3.26s (7.7kread/s) ( 1,000,000 reads in 28.74s (34.8kread/s) (1.5x speedup) 3,000,000 reads in 89.4s (33.5kread/s) (2.65x speedup) 12,000,000 reads in 313s (38.3kread/s) (>10x speedup)

DaehwanKimLab / hisat2

fix hisat2 performance: remove unused SpliceSiteDB from AlnSink #439