Closed y9c closed 2 years ago
@y9c Very good question!
This is highly related to how HISAT2 is implemented.
For your alignment result, you can find that:
demo1 can be mapped to reverse complement strand of reference (flag 16). This means RC-read mapped to index.
demo2 can be mapped to the forward strand of reference (flag4). This means FW-read mapped to index. This is why HISAT2 can map demo2_5 but cannot map demo2_3.
If the additional base is always at the end of your read sequence, I suggest you try -5/--trim5
and -3/--trim3
option.
Best, Leo
Thank you for the explanation. I realized that this might cause another problem even when the reads are long.
After trimming, the start position of read2 might be ahead of end position of read1 as the diagram bellow. Top strand (red) is read 1, while bottom strand (green) is read2. This read can NOT be mapped in PE mode.
(a demo sequence from the ASM38351v1 genome)
Read1:
@demo1
CAACTAGGAAGTTGGCTTAGAAGCAGCCACCTTTTAAAGAGTGCGTAATTGCTCACTAG
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Read2:
@demo1
TAGTGAGCAATTACGCACTCTTTAAAAGGTGGCTGCTTCTAAGCCAACTTCCTAGTTG
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I would like to know if hisat2 is a suitable aligner for short reads (<30nt). The testing is as bellow.
I copied two unique sequences (demo1 & demo2) from the human genome, and download the
genome_snp_rep
index of hg38 as the mapping reference. Then I add a 5' overhang or 3' overhang respectively. For example, sequence demo1_5 is exactly the sequence of demo1 but there is an unmapped G base on the 5' end. The overhang sequence is very common is some RNA libraries. Because the reverse transcriptase enzyme can add random tail (the length is not fix) during the RT step, and even trimming 3bp from the sequence is recommended, some reads can still have one or two overhang bases after timming.By running hisat2 with default setings (
hisat2 -x genome_snp_rep -U test.fq
), it seem that 5' overhang will affect the mapping of demo1 read, while 3' overhang is not. (output1) Is this means the alignment is from the 5'->3' direction? But it is weird that demo2 read show an opposite result.Then I adjust the mapping parameters into
hisat2 --bowtie2-dp 2 --score-min L,0,-1 --sp 1,0 --mp 1,0 --rdg 0,2 -x genome_snp_rep -U test.fq
. The result (output2) do not show increase on mapping ratio. Meanwhile, some MD tag look very strange.Could help me figure out what cause this problem in short read alignment? Thanks!
input:
output1:
(output2)
Some other issue might be relative to this:
329 #84