DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

hisat2-2.2.1: cannot detect 10nt deletion due to overly strict default thresholds #275

Open mmokrejs opened 3 years ago

mmokrejs commented 3 years ago

Hi, it happened to me I had to disable softclipping altogether in hisat2 aligner a decrease limits for alignment extension as I was loosing some 10nt deletions due to too short alignment on either side of the deletion site. In addition I had to also disable softclipping.

Hisat2 is aimed at about 125 or 150 reads according to some comments in your Changelog. But if you get 75 nt reads you have to push down the thresholds and disable softlipping too. This should be inluded in the documentation, Readme.md and runtime help.

I ended up with hisat2-2.2.1/hisat2-align-s --bowtie2-dp 2 --score-min L,0,-1 --no-softclip .... Probably the trailing -1 is an overkill but I failed to understand how this works.

Here is a small testcase:

read_pairs_spanning_10nt_deletion_at_chr5:112780802.R1.fastq.txt read_pairs_spanning_10nt_deletion_at_chr5:112780802.R2.fastq.txt

M05378:215:000000000-J5BH5:1:1102:20785:17971   163 chr5    112780766   60  37M10D39M   =   112780848   158 CTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGT    CCCCCGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGG    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:37^CAGATATGAC39    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1101:12563:23119   163 chr5    112780783   60  20M10D56M   =   112780897   190 GTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGAAGAACAAC    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFEGGGGGGGGGGGGGGG=    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:20^CAGATATGAC56    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1105:16562:3655    163 chr5    112780776   60  27M10D48M   =   112780828   128 ATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:27^CAGATATGAC48    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1106:12165:4867    99  chr5    112780751   60  52M10D24M   =   112780889   214 TACAAGATATTGATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGC    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:52^CAGATATGAC24    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1106:5118:14480    163 chr5    112780783   60  20M10D56M   =   112780889   182 GTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGAAGAACAAC    CCCCCGDFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGGGGGGGGGG    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:20^CAGATATGAC56    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1112:8032:10009    163 chr5    112780753   60  50M10D25M   =   112780824   146 CAAGATATTGATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGG AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:50^CAGATATGAC25    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1108:2921:10346    99  chr5    112780760   60  43M10D33M   =   112780891   207 TTGATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAAT    CCCCCDEFGGGFGFC<FFFGGGGGFGGGGGEFGGGGGGGFGGEFGFEGFGG@FFGGGGEGGGGGGGGFFGGGGDGE    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:43^CAGATATGAC33    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1106:23178:11144   99  chr5    112780763   60  40M10D34M   =   112780765   88  ATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATC  CCCCCGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGFGGGFGFGGGEGFGGGGGGGGCFFGGGGGG  AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:40^CAGATATGAC34    YS:i:-35    YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1106:23178:11144   147 chr5    112780765   60  38M10D38M   =   112780763   -88 ACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAG    DEC:CFCGGGFFGFDFCGGGGGGGGFFGGGGGGGGGGGGGGFGGGGGGGGGGFFGDGGGGGGGGGGGGGGGCCCCC    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:38^CAGATATGAC38    YS:i:-35    YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1113:21116:20549   163 chr5    112780786   60  17M10D57M   =   112780850   140 TTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGAAGAACAACT  CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:17^CAGATATGAC57    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:1112:13915:12862   83  chr5    112780773   60  30M10D46M   =   112780715   -144    ATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATG    GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:30^CAGATATGAC46    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2102:24117:3162    163 chr5    112780776   60  27M10D48M   =   112780860   160 ATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGFGGGGGGGGGGGGDDFGG AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:27^CAGATATGAC48    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2103:10929:17029   99  chr5    112780776   60  27M10D48M   =   112780826   126 ATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:27^CAGATATGAC48    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2107:9721:2488 99  chr5    112780753   60  50M10D25M   =   112780874   196 CAAGATATTGATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCA CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGFFF@FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:50^CAGATATGAC25    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2110:11324:21624   99  chr5    112780781   60  22M10D53M   =   112780848   143 TGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGAAGAAC CCCCCEGGGGFFGGGGGEGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGEGG8FGGGFFGF AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:22^CAGATATGAC53    YS:i:-3 YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2118:16420:10966   163 chr5    112780766   60  37M10D39M   =   112780858   167 CTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGT    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGFGGGG    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:37^CAGATATGAC39    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2113:22922:15823   163 chr5    112780755   60  48M10D28M   =   112780951   272 AGATATTGATACTTTTTTATTATTTGTGGTTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGG    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:48^CAGATATGAC28    YS:i:0  YT:Z:CP NH:i:1
M05378:215:000000000-J5BH5:1:2113:25798:15320   99  chr5    112780784   60  19M10D57M   =   112781002   294 TTTTAGTTTTCCTTACAAACAGAAGGCAATTGGAATATGAAGCAAGGCAAATCAGAGTTGCGATGGAAGAACAACT    CCCCCF<6CEEGGCF@<FFAFDGFGG<6@<CF@FD9FGGGAFDFFF8FEFGG9,@,@CCE9CFDF7FDGGGGEFEC    AS:i:-35    XN:i:0  XM:i:0  XO:i:1  XG:i:10 NM:i:10 MD:Z:19^CAGATATGAC57    YS:i:-3 YT:Z:CP NH:i:1