HISAT2 repeat function - Githubissues

I have tested the --repeat option, which will report alignments to repeat sequences directly in hisat2 alignment.

Here's the alignment summary of the results with --repeat option:

HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1048872 (3.50%) Aligned concordantly 1 time: 27346370 (91.34%) Aligned concordantly >1 times: 1504009 (5.02%) Aligned discordantly 1 time: 40378 (0.13%) Total unpaired reads: 2097744 Aligned 0 time: 1156845 (55.15%) Aligned 1 time: 868436 (41.40%) Aligned >1 times: 72463 (3.45%)

And here's the results without --repeat:

HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1060789 (3.54%) Aligned concordantly 1 time: 27023850 (90.26%) Aligned concordantly >1 times: 1822218 (6.09%) Aligned discordantly 1 time: 32772 (0.11%) Total unpaired reads: 2121578 Aligned 0 time: 1175151 (55.39%) Aligned 1 time: 871441 (41.08%) Aligned >1 times: 74986 (3.53%)

As we can see, with --repeat option, the "Aligned concordantly > 1 times" pairs decreased about 1.07%, while the "Aligned concordantly 1 time" pairs incresed about 1.08%. It seems that with --repeat option, many repeat reads were aligned exactly one time to the genome, while without this option these reads were aligned multiple times to the genome.

By the way, I built the hisat2 index without repeat related options: --repeat-ref, --repeat-info, --repeat-snp and --repeat-haplotype. I'm not sure if the hisat2 alignment of repeat reads will be improved with the indexes include these information.

Best, WSZ

DaehwanKimLab / hisat2

HISAT2 repeat function #262