DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

Wrong sign of tlen (the 9th column) of sam output #205

Closed dbrg77 closed 5 years ago

dbrg77 commented 5 years ago

Hello,

Basically, I have this problem when a pair of reads are reverse complement to each other. This often happens when the insert size is smaller than the read length, so the the 3' end of the reads cover the adapters. After trimming, the pair of reads will become reverse-complementary to each other.

I'm using hisat2-2.1.0.

These are the reads:

$ cat r1.fastq
@ST-E00243:693:H2NF7CCX2:3:1101:10003:7110 1:N:0:TTCCATAT+TGCACGAA
GCCCTGTGCAAACCTGCTTACTGCAGCCTTCCCATGCCTGGGCGCTAAATTGGAAGCAAAAGACTC
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

$ cat r2.fastq
@ST-E00243:693:H2NF7CCX2:3:1101:10003:7110 2:N:0:TTCCATAT+TGCACGAA
GAGTCTTTTGCTTCCAATTTAGCGCCCAGGCATGGGAAGGCTGCAGTAAGCAGGTTTGCACAGGGC
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

This is the command I use to map this pair of reads:

hisat2 --no-head --no-temp-splicesite --no-spliced-alignment -p 8 -x /mnt/iGenome/Mus_musculus/UCSC/mm10/Sequence/Hisat2Index/genome -1 r1.fastq -2 r2.fastq

The output is:

ST-E00243:693:H2NF7CCX2:3:1101:10003:7110   99  chr8    117074149   60  66M =   117074149   -66 GCCCTGTGCAAACCTGCTTACTGCAGCCTTCCCATGCCTGGGCGCTAAATTGGAAGCAAAAGACTC  AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:66 YS:i:0  YT:Z:CP NH:i:1
ST-E00243:693:H2NF7CCX2:3:1101:10003:7110   147 chr8    117074149   60  66M =   117074149   -66 GCCCTGTGCAAACCTGCTTACTGCAGCCTTCCCATGCCTGGGCGCTAAATTGGAAGCAAAAGACTC  JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:66 YS:i:0  YT:Z:CP NH:i:1

Both reads have -66 at the 9th column (tlen), but the tlen of read1 should be 66.

Adding -3 1 during mapping fixed the problem though.

Regards, Xi

parkchanhee commented 5 years ago

Hi Xi,

Thank you for reporting.

If two segments are mapped to same location and they have a same length, the sign of template length is minus because hisat2 considered a read as a rigthmost read of template. I fixed hisat2 to handle correctly.

You can download hisat2 code from master branch.

Thanks Chanhee

dbrg77 commented 5 years ago

Thanks for the update, Chanhee.

I will close this now.

Regards, Xi