DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

hisat2-2.2.1: Calculated TLEN value seems wrong #276

Open mmokrejs opened 3 years ago

mmokrejs commented 3 years ago

Hi, I asked hisat2-2.2.1/hisat2-align-s --bowtie2-dp 2 --score-min L,0,-1 --no-softclip --fr -k 10 --max-seeds 20 --avoid-pseudogene --secondary -p 22 ... to get also some secondary matches. It seems to me the TLEN (9th column) seems to be just wrong. In the CIGAR string I see loooong jump but the TLEN value seems much lower.

Look for insert size 6246 below:

M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr11   107606696   60  14M63133N48M16789N14M   =   107693280   6736    GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:9T5T5A0C38A9G4 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:6  XM:i:6  XN:i:0  XO:i:0  AS:i:-34    XS:A:-  YS:i:-36    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr11   107606696   60  14M63133N62M    =   107693280   23525   GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:9T5T5A0C38A1T0G11  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:7  XM:i:7  XN:i:0  XO:i:0  AS:i:-38    XS:A:-  YS:i:-36    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr11   107662471   60  76M =   107693280   30883   GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:9C0A0T3T3A2C41G11  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:7  XM:i:7  XN:i:0  XO:i:0  AS:i:-35    YS:i:-36    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr11   107669829   60  62M16789N14M    =   107693280   6736    GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:0A7A3A2T5A0C38A9G4 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:8  XM:i:8  XN:i:0  XO:i:0  AS:i:-41    XS:A:-  YS:i:-36    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr11   107684600   60  76M =   107693280   8754    GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:8A6T3A2C41G6G3T0   PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:7  XM:i:7  XN:i:0  XO:i:0  AS:i:-35    YS:i:-36    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr11   107693280   60  33M2I41M    =   107606696   -6736   GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A1G1C1A5T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:2  NH:i:10 NM:i:7  XM:i:5  XN:i:0  XO:i:1  AS:i:-36    YS:i:-34    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr11   107693280   60  33M2I41M    =   107662471   -30883  GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A1G1C1A5T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:2  NH:i:10 NM:i:7  XM:i:5  XN:i:0  XO:i:1  AS:i:-36    YS:i:-35    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr11   107693280   60  33M2I41M    =   107684600   -8754   GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A1G1C1A5T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:2  NH:i:10 NM:i:7  XM:i:5  XN:i:0  XO:i:1  AS:i:-36    YS:i:-35    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr11   107693280   60  33M2I41M    =   107606696   -23525  GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A1G1C1A5T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:2  NH:i:10 NM:i:7  XM:i:5  XN:i:0  XO:i:1  AS:i:-36    YS:i:-38    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr11   107693280   60  33M2I41M    =   107669829   -6736   GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A1G1C1A5T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:2  NH:i:10 NM:i:7  XM:i:5  XN:i:0  XO:i:1  AS:i:-36    YS:i:-41    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    99  chr17   49499573    60  76M =   49499669    172 TATTTTTATTTTATTTATTTATTTATTTATTTTGAGACGGACTGTCACTCTGCCACCCAGGCTGGAGTGCAGTGGC    CCCCCGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGCFGGFGGGGGFGFGGGGGGGGGGFFGGGCGGGFGGGGFDF    MD:Z:76 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:-5 ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    147 chr17   49499669    60  76M =   49499573    -172    CCTCTGCCTCCTGGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGAGACTACGGGCGCCCGCCACCAC    FFGGGGGGFGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:60A15  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:1  XM:i:1  XN:i:0  XO:i:0  AS:i:-5 YS:i:0  ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr22   28290773    60  43M94229N33M    =   28432243    28335   GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:0A7A0T2A18T0G14A5T2C19 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:9  XM:i:9  XN:i:0  XO:i:0  AS:i:-48    XS:A:+  YS:i:-25    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr22   28304146    60  37M80856N39M    =   28432243    28335   GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:15T6C24A5T2C19 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:5  XM:i:5  XN:i:0  XO:i:0  AS:i:-40    XS:A:+  YS:i:-25    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr22   28328615    60  62M56387N14M    =   28432243    28335   GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:5A2A3A2T3A2C9G0A42 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:8  XM:i:8  XN:i:0  XO:i:0  AS:i:-42    XS:A:-  YS:i:-25    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    419 chr22   28413299    60  37M12774N39M    =   28432243    6246    GTGGTGGCGGGCGCCCGTAGTCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGCAGAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGFF    MD:Z:8A6T6C9G21A6T9C2A1 PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:8  XM:i:8  XN:i:0  XO:i:0  AS:i:-53    XS:A:+  YS:i:-25    ZS:i:-5 YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr22   28432243    60  76M =   28304146    -28335  GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A5C2G1C2T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:5  XM:i:5  XN:i:0  XO:i:0  AS:i:-25    YS:i:-40    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr22   28432243    60  76M =   28328615    -28335  GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A5C2G1C2T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:5  XM:i:5  XN:i:0  XO:i:0  AS:i:-25    YS:i:-42    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr22   28432243    60  76M =   28290773    -28335  GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A5C2G1C2T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:5  XM:i:5  XN:i:0  XO:i:0  AS:i:-25    YS:i:-48    ZS:i:-25    YT:Z:CP
M05378:215:000000000-J5BH5:1:1111:27464:9692    339 chr22   28432243    60  76M =   28413299    -6246   GCCACTGCACTCCAGCCTGGGTGGCAGAGTGACAGTCCGTCTCAAAATAAATAAATAAATAAATAAAATAAAAATA    FDFGGGGFGGGCGGGFFGGGGGGGGGGFGFGGGGGFGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGCCCCC    MD:Z:23A5C2G1C2T38  PG:Z:MarkDuplicates RG:Z:J5BH5.1.BRCA5231   XG:i:0  NH:i:10 NM:i:5  XM:i:5  XN:i:0  XO:i:0  AS:i:-25    YS:i:-53    ZS:i:-25    YT:Z:CP

https://samtools.github.io/hts-specs/SAMv1.pdf page 8 states it should be

I read it maybe too quickly but it should be around 28432243-28413299+1=18945, from the CIGAR 37M12774N39M I would asume around 12774 but where 6246 come from I have no clue.

Thank you,

y9c commented 1 year ago

Seem that hisat2 calculated the TLEN in a wrong way. The length of R2 is not considered in the calculation.

image

(red line: hisat2 TLEN sepc)

Hi, @imzhangyun, could you help us on this issue?