DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

HISAT2 repeat function #262

Open asilvestris84 opened 4 years ago

asilvestris84 commented 4 years ago

Dear, I would like to kindly ask you to clarify in the manual the use of the --repeat function which currently seems somewhat lacking. In particular, what does it do? how does it work? and how should it be used? These aspects are unfortunately not clear at the moment.

Many thanks in advance

Alessandro Silvestris

VictorZheng1010 commented 4 years ago

I have tested the --repeat option, which will report alignments to repeat sequences directly in hisat2 alignment.

Here's the alignment summary of the results with --repeat option:

HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1048872 (3.50%) Aligned concordantly 1 time: 27346370 (91.34%) Aligned concordantly >1 times: 1504009 (5.02%) Aligned discordantly 1 time: 40378 (0.13%) Total unpaired reads: 2097744 Aligned 0 time: 1156845 (55.15%) Aligned 1 time: 868436 (41.40%) Aligned >1 times: 72463 (3.45%)

And here's the results without --repeat:

HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1060789 (3.54%) Aligned concordantly 1 time: 27023850 (90.26%) Aligned concordantly >1 times: 1822218 (6.09%) Aligned discordantly 1 time: 32772 (0.11%) Total unpaired reads: 2121578 Aligned 0 time: 1175151 (55.39%) Aligned 1 time: 871441 (41.08%) Aligned >1 times: 74986 (3.53%)

As we can see, with --repeat option, the "Aligned concordantly > 1 times" pairs decreased about 1.07%, while the "Aligned concordantly 1 time" pairs incresed about 1.08%. It seems that with --repeat option, many repeat reads were aligned exactly one time to the genome, while without this option these reads were aligned multiple times to the genome.

By the way, I built the hisat2 index without repeat related options: --repeat-ref, --repeat-info, --repeat-snp and --repeat-haplotype. I'm not sure if the hisat2 alignment of repeat reads will be improved with the indexes include these information.

Best, WSZ