Open asilvestris84 opened 4 years ago
I have tested the --repeat
option, which will report alignments to repeat sequences directly
in hisat2 alignment.
Here's the alignment summary of the results with --repeat
option:
HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1048872 (3.50%) Aligned concordantly 1 time: 27346370 (91.34%) Aligned concordantly >1 times: 1504009 (5.02%) Aligned discordantly 1 time: 40378 (0.13%) Total unpaired reads: 2097744 Aligned 0 time: 1156845 (55.15%) Aligned 1 time: 868436 (41.40%) Aligned >1 times: 72463 (3.45%)
And here's the results without
--repeat
:HISAT2 summary stats: Total pairs: 29939629 Aligned concordantly or discordantly 0 time: 1060789 (3.54%) Aligned concordantly 1 time: 27023850 (90.26%) Aligned concordantly >1 times: 1822218 (6.09%) Aligned discordantly 1 time: 32772 (0.11%) Total unpaired reads: 2121578 Aligned 0 time: 1175151 (55.39%) Aligned 1 time: 871441 (41.08%) Aligned >1 times: 74986 (3.53%)
As we can see, with --repeat
option, the "Aligned concordantly > 1 times" pairs decreased about 1.07%, while the "Aligned concordantly 1 time" pairs incresed about 1.08%. It seems that with --repeat
option, many repeat reads were aligned exactly one time to the genome, while without this option these reads were aligned multiple times to the genome.
By the way, I built the hisat2 index without repeat related options: --repeat-ref
, --repeat-info
, --repeat-snp
and --repeat-haplotype
. I'm not sure if the hisat2 alignment of repeat reads will be improved with the indexes include these information.
Best, WSZ
Dear, I would like to kindly ask you to clarify in the manual the use of the --repeat function which currently seems somewhat lacking. In particular, what does it do? how does it work? and how should it be used? These aspects are unfortunately not clear at the moment.
Many thanks in advance
Alessandro Silvestris