NCGG-MGC / IMSindel

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis
https://www.nature.com/articles/s41598-018-23978-z
MIT License
15 stars 0 forks source link

missing indels? #14

Closed liuxianghui closed 4 years ago

liuxianghui commented 4 years ago

I tried but it seems not find indels.... ( error message as shown in the bottom part.) Suprising! Any mistakes with my test file? Could you kindly provide one test data here so that I can confirm that it is my error or problem with the installation? Xianghui

(new) BigMacMealUpsize:IMSindel xianghui.liu$ imsindel --bam snps.bam --chr CP009273 --outd out --indelsize 10000 --reffa ref.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: snps.bam chr: CP009273 outd: out indelsize: 10000 reffa: ref.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 snps.bam CP009273

    backward_clips: 0

    forward_clips: 0

    non_clips: 0

  2. collecting indel related reads...done missing indels
holrock commented 4 years ago

Hi, Is CP009273 or soft clips in snps.bam? Could you check the results of samtools view -F 1024 -f 2 snps.bam CP009273?

liuxianghui commented 4 years ago

I work with bacteria genome. Escherichia coli BW25113. So it is the title of reference genome in contig fasta file. This one only have one contig. Escherichia coli BW25113, complete genome Escherichia coli BW25113 4,631,469 bp genomic sequence CP009273.1

There is no result of command samtools view -F 1024 -f 2 snps.bam CP009273

ref.fa is like this:

CP009273 Escherichia coli BW25113, complete genome. AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC ...

samtools view snps.bam gives result like this: M00385:151:000000000-J3BLL:1:2103:10404:6827 16 CP009273 321025 60 378M 0 0 ATAAAATAAAATCTCAATCAGATCGATATCCTGTTTGATTTGTTCACGCATAATATATCCAGAGAATAAAATCTGTCGCAGATAAGGTTGTATTAATAGTCTGTATCAGGAATGTTCGGGTTAAATATCAGCAAAAAGCCCGCATCATGAATACTGGATATGAAGCATGAGAGTTACCTCAGTGTTTATATAAGGATTCGGTCCCCCTCTCTGGAACGGTAACTCTCAATCTGATCGGTTCCTGCGTTAGTTCACATCACGACTCATTTTTTCGCTCTCACCGGCATCCCATTTGCCACAAAATATCCCGCCGTGCTCCTCGGCAGCGCTTCCTGTCCACGAATCATATCCGCTATTTTCTCGCCAATCATAATTGTC CCCCCGFGGGGGGGGGGGGGCDGGGGGGDGGGGGGGGGGGGGGGFGGGGGGGGGGGGGCEFFGFFGFFFFFGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIJJJJJJJJIJJJIJJIIJJJJJJJJJJJJJJJJJJJJJJJJJJI=FJJJJ?JIJJJJJJJJIJIIJJJGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGCCCCC NM:i:0 MD:Z:378 AS:i:378 XS:i:22 RG:Z:acrA_S15_ffsnps M00385:151:000000000-J3BLL:1:1108:22043:5492 16 CP009273 321029 60 507M 0 0 AATAAAATCTCAATCAGATCGATATCCTGTTTGATTTGTTCACGCATAATATATCCAGAGAATAAAATCTGTCGCAGATAAGGTTGTATTAATAGTCTGTATCAGGAATGTTCGGGTTAAATATCAGCAAAAAGCCCGCATCATGAATACTGGATATGAAGCATGAGAGTTACCTCAGTGTTTATATAAGGATTCGGTCCCCCTCTCTGGAACGGTAACTCTCAATCTGATCGGTTCCTGCGTTAGTTCACATCACGACTCATTTTTTCGCTCTCACCGGCATCCCATTTGCCACAAAATATCCCGCCGTGCTCCTCGGCAGCGCTTCCTGTCCACGAATCATATCCGCGATTTTCTCGCCAATCATAATTGTCGTGGCGTTCAAATTCCCGGTGATAATCTGCGGCATAATCGACGCATCCACCACACGCAGGCCTTCTAACCCGTGTACGCGGCCTTCGCCGTCAACCACGGACATCTCGTCGTAACCCATTTTGCAGGTACCGC CCCCGGGFFDFGCFFGGFGGGGDEGGGGGGGGGGCFCGGGGGFCFGGGGDFGFGCFGDFGFGGGGGFGGGFGG@CF:EEEFGFGFCGGFFFGGGFEC<FGFFFGGBFCEF<?F@FE4DF;FAFGF9EF?EFFGF@CBCGGGGFGGCGGGGGGGGFGGGFG?EFDF,?DDFCGF9D8DFD;ECADGEFFF,@E8C=C>E?8CEAD6<8CJIJJICJIFIDJJJJJJJC6GJJJJJJBBAJJJJJJJJJJJBIICJJJJHHJJ?JJJJHJI=G>-EGH<<7<EBJJG:JJJJJJHHDJ?JI8ECE5GEF:CE8GECC>>GEEEGGGGGGCF<7GGCFAGCC<:7F:@GFECADGGGFDCFF@GGF@@,F@@CGGGGCCGEFD=9=BGGGGFBDGF<DFCFCFGFFGFFDGGGGECGGDGGF?GGGEFEFCD@FDGGGFEFGGGGEGGGGGEEFCGDEGGEFGGGGGGGGGFFGFFFF:@GCGGEFF;GFFECGGGGGEFGCFBCCCC NM:i:1 MD:Z:349T157 AS:i:502 XS:i:22 RG:Z:acrA_S15_ffsnps M00385:151:000000000-J3BLL:1:1115:26456:15074 0 CP009273 321036 60 415M 0 0 TCTCAATCAGATCGATATCCTGTTTGATTTGTTCACGCATAATATATCCAGAGAATAAAATCTGTCGCAGATAAGGTTGTATTAATAGTCTGTATCAGGAATGTTCGGGTTAAATATCAGCAAAAAGCCCGCATCATGAATACTGGATATGAAGCATGAGAGTTACCTCAGTGTTTATATAAGGATTCGGTCCCCCTCTCTGGAACGGTAACTCTCAATCTGATCGGTTCCTGCGTTAGTTCACATCACGACTCATTTTTTCGCTCTCACCGGCATCCCATTTGCCACAAAATATCCCGCCGTGCTCCTCGGCAGCGCTTCCTGTCCACGAATCATATCCGCTATTTTCTCGCCAATCATAATTGTCGTGGCGTTCAAATTCCCGGTGATAATCTGCGGCATAATCGACGCATCC CCCCCGGGGGCFGGGGGGDCFEEGGGGGCCGGFGGGGGGGGGGGGGGGAFGGGGGGGGGGGGGGGGGGCFEGGFGGGGGGGFGGGGGGGGGFGGFGGGGG,EFGFGGGGGGGGGIHJJHJI>IHHIJJJJJJJJJJI>JJJJJJJJJJIIJIJJJJJJ>@H>JJJJJJJJJJJJCJJJJJJ?JJJJJJJ?JJHJJIIJIJJJIJJJJJJIJJJJJJJJJJJJ>JJJJJIJJJJJJ>I6JJJJHJJJJJJJJJJJ=JJJJJJJJJDJEJ@JJJJJJJJJJJJJJJJJJJJJJIEIJJJJJJE=7AEF@F7F?:GGGGGFFCGGFFGEFBFFFDGF?F>GGGGEGFEF?>:@C8FC,AECGGGGGGGGGFFCFFFFEFEE<GGGECFGCFFE9C@C7GGGGGGGGGGGFCGGCCCCC NM:i:0 MD:Z:415 AS:i:415 XS:i:0 RG:Z:acrA_S15_ffsnps M00385:151:000000000-J3BLL:1:1119:16155:16213 0 CP009273 321039 60 131M 0 0 CAATCAGATCGATATCCTGTTTGATTTGTTCACGCATAATATATCCAGAGAATAAAATCTGTCGCAGATAAGGTTGTATTAATAGTCTGTATCAGGAATGTTCGGGTTAAATATCAGCAAAAAGCCCGCAT JJJJJHJJJEJJJJJJJHJJJJJJJJJJJJEJJJJJJJJJJJJIGJJJJIJJJJJJJJJJJJJJHJHJBJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJHJJJJ NM:i:0 MD:Z:131 AS:i:131 XS:i:0 RG:Z:acrA_S15_ffsnps M00385:151:000000000-J3BLL:1:1103:21662:8290 16 CP009273 321039 60 361M * 0 0 CAATCAGATCGATATCCTGTTTGATTTGTTCACGCATAATATATCCAGAGAATAAAATCTGTCGCAGATAAGGTTGTATTAATAGTCTGTATCAGGAATGTTCGGGTTAAATATCAGCAAAAAGCCCGCATCATGAATACTGGATATGAAGCATGAGAGTTACCTCAGTGTTTATATAAGGATTCGGTCCCCCTCTCTGGAACGGTAACTCTCAATCTGATCGGTTCCTGCGTTAGTTCACATCACGACTCATTTTTTCGCTCTCACCGGCATCCCATTTGCCACAAAATATCCCGCCGTGCTCCTCGGCAGCGCTTCCTGTCCACGAATCATATCCGCTATTTTCTCGCCAATCATAATT CCCCCGFGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGFGCGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIJJIJJJJJJJJJJJJJJJJJJJJJIJJJJIIJJJJJJJJJJJJJJIJJJJJJJJJJJJJJ?JJJIJJJJJJIJJJJJJJJ?GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC NM:i:0 MD:Z:361 AS:i:361XS:i:0 RG:Z:acrA_S15_ffsnps ...

holrock commented 4 years ago

IMSindel requires mapped read proper pairs. (SAM flag 2) Unfortunately, the bam could not use the IMSindel.

liuxianghui commented 4 years ago

I changed one using paired end reads other than this one using merged paired end reads.
The output seems to be different.
(new) BigMacMealUpsize:IMSindel xianghui.liu$ imsindel --bam snps.bam --chr CP009273 --outd outx --indelsize 10000 --reffa ref.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: snps.bam chr: CP009273 outd: outx indelsize: 10000 reffa: ref.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 snps.bam CP009273

    backward_clips: 1239

    forward_clips: 1202

    non_clips: 3822

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 snps.bam CP009273 mate_unmapped_read_names: 1196 samtools view -F 1024 -f 4 snps.bam CP009273 Insert size Avg: 35638.921761165664 SD: 402205.8048246385

    unmapped reads: 55

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 1

    forward clip with support reads: 1

    non_clips with suport reads: 10

  6. considering support reads...done
  7. making consensus seqs from support reads...

    backward clip with consensus: 1 --> 1

    forward clip with consensus: 1 --> 0

    shot indel with consensus: 10 --> 0

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 0

    unpaired long indel candidates: 1

    short indel candidates: 0

    [faidx] Truncated sequence: CP009273:4626467-4636469

    all_pos: 0

    fin_indels: 0

    empty all_pos

  11. detection of indels...done

However, there is no output file. ... The bam files seems to have some problem?? I randomly checked two reads. Which SAM flat 2 shall be pairs.? Could you kindly put some example files in the github so that I can test the program. And please kindly show the command you ran for mapping. My bam file was generated when I used another program snippy. However, that program can only detect SNPs and not large structure variants like gene deletion.

(base) [xianghui@ln-0001 acrA_S15_R1R2snps]$ samtools view snps.bam | grep 'M00385:151:000000000-J3BLL:1:1101:6633:16704' M00385:151:000000000-J3BLL:1:1101:6633:16704 64 CP009273 1 60 74S227M 0 0 CAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCAC CCCCCGGGGGGGGFGGGGGEFFFCFFGG9EFGGGFGGFGCCFGGGGGGFGGF@CF@@FGGGEGFGGGGG9F9FCFGGGGGFGGGGFGGGGGGGGGGGD@B@GEG9FFGGGGGG,4CFGAFCA?F:=FFFFFFFGGGGGGFGG?FEFFFE<FFFBGGGCGGFGG@EE@+CCFGGGGGGGGGGG=;DGGGGGG8AE?FGGGGGGE7AEFGFGFGGGGGD>5<CFFGDDGGCF4909>@+=@8;AAFFFF68BFEF=BFAFDFF6@E6AD>FDDF386=?;0;DEEFF)6:<??B?D?<, NM:i:0 MD:Z:227 MC:Z:90S211M AS:i:227 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631396,+,74M227S,60,0; M00385:151:000000000-J3BLL:1:1101:6633:16704 355 CP009273 4631396 60 74M227S = 302 -4630885 CAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCAC CCCCCGGGGGGGGFGGGGGEFFFCFFGG9EFGGGFGGFGCCFGGGGGGFGGF@CF@@FGGGEGFGGGGG9F9FCFGGGGGFGGGGFGGGGGGGGGGGD@B@GEG9FFGGGGGG,4CFGAFCA?F:=FFFFFFFGGGGGGFGG?FEFFFE<FFFBGGGCGGFGG@EE@+CCFGGGGGGGGGGG=;DGGGGGG8AE?FGGGGGGE7AEFGFGFGGGGGD>5<CFFGDDGGCF4909>@+=@*8;AAFFFF68BFEF=BFAFDFF6@E6AD>FDDF386=?;0;DEEFF)6:<??B?D?<, NM:i:0 MD:Z:74 MC:Z:90S211M AS:i:74 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,1,+,74S227M,60,0;

samtools view snps.bam |head M00385:151:000000000-J3BLL:1:1101:6633:16704 64 CP009273 1 60 74S227M 0 0 CAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCAC CCCCCGGGGGGGGFGGGGGEFFFCFFGG9EFGGGFGGFGCCFGGGGGGFGGF@CF@@FGGGEGFGGGGG9F9FCFGGGGGFGGGGFGGGGGGGGGGGD@B@GEG9FFGGGGGG,4CFGAFCA?F:=FFFFFFFGGGGGGFGG?FEFFFE<FFFBGGGCGGFGG@EE@+CCFGGGGGGGGGGG=;DGGGGGG8AE?FGGGGGGE7AEFGFGFGGGGGD>5<CFFGDDGGCF4909>@+=@8;AAFFFF68BFEF=BFAFDFF6@E6AD>FDDF386=?;0;DEEFF)6:<??B?D?<, NM:i:0 MD:Z:227 MC:Z:90S211M AS:i:227 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631396,+,74M227S,60,0; M00385:151:000000000-J3BLL:1:1101:20895:22553 64 CP009273 1 60 139S162M 0 0 AGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGTCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAG CCCC@FGGFFFFGGGGAEGEGFGCE9CFGGGFFGG@CFFFFGGGGGGGFFDGFF<:DCCF?CGGGGCEFFGGGFFGDGECGCFGCEFA?EFGFFF<EFGGCFEBFDGGFG8EFGGGFEGFGGFBFGFFE,EFFFFEFGGCFGFGDFGGGGGG9FF9FGGGG@3@EGGFGGGGGFGA;BEFGGDFFFFFFEGGG6:AFGGGGGGGGGGGDDGG?CF1,980A+;CFGGGG?65C7:+;7C6FFFFFFFA7CCFA5+)9:=A;AEAECAE?CEF3><<:22<9>2:B?:>:;;>F>(,(,( NM:i:1 MD:Z:24G137 MC:Z:34S267M AS:i:157 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631331,+,139M162S,60,0; M00385:151:000000000-J3BLL:1:1101:21051:1731 163 CP009273 1 60 71S225M5S = 89 389 TGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCAGAGCGCACAGACAGATAAACATTACAGAGTACACAGCATCCATGAAACGCATTAGCACCACCATTACCCCCACCATCCCCAC C@CCCGGFGGGGCCGGGGGGGGGGGFEGGGGGGGECDFFGGDGGGGGGGGCFCEEFCGGFDFGF,FFFGGGGGFGCAFFE9FFEGGFGGGGGDGGGGGGGGGGGFCEFGGGGGGGAECFGG:FGFGGGFFGFGAFFGGGGGFGFGGDGFGGDFEFFEGGFGECG>DDFDEGD@CCG8DFFFF9FGGCFGGGGGGFF6FFGGGGGGGGGG67+=<;89C*1<D?DEG40;C4797B@;87DFGGGF::=@82?CF;F+3=7FFA):DEFC6+=86:5ACD7C8))5>C4:09/2 NM:i:4 MD:Z:148T18A15A32A8 AS:i:205 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631399,+,71M230S,60,0; MQ:i:60 MC:Z:301M ms:i:7730 M00385:151:000000000-J3BLL:1:1103:21489:4383 353 CP009273 1 60 184S117M = 1 287 TTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTT CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGEGGGGGGGGGGGGCGGGGGGFGGGGGGGGGDEGGGGGGGGGGGFGFGGGGGGGFCCFFGGGGGGGGGGGGGGGGGGGGGFGGCFFGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGFFDBCCGGGGGGGGGGGFFFGGGGG?FCAEF9CGGFFGEGGGGGFFG?FFFFFFGGFCFGFBFGFFFFGFFG2@FFF@5)4?9FA2:>FBFFFAFFA> NM:i:0 MD:Z:117 MC:Z:14S287M AS:i:117 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631286,+,184M117S,60,0; M00385:151:000000000-J3BLL:1:1103:28652:12093 99 CP009273 1 60 77S224M = 4 222 CGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCAA CCCCCGGGCFFGGGGGCC@CGCFDFGGCDFEA,6FFAFCGGFGCEGGGGGGFFDFGGGGEGGCFFGGGGGCGFFFAACC9FEC9FGFFFGGGGC9FEFGGE@@FFFFFAFGAFFAFFEB?AEFFGFGGCC@D<FEFF,<C=FFGGGGAFGGFGGCGGGGGGGGGGFFFEGGEGAFGFGGFFFFFFGGCCDFGCGDFDEEGGEGGGFAFGGGGGFG5C8CGG7ECA:E6CDGEFGFGGGGGFGGGFFFGFFFGFCFCAFF*05;F<FF;FAFDE>BFF+/)7C3?D>B>4FA4:BBB03( NM:i:1 MD:Z:223T0 AS:i:223 XS:i:0 RG:Z:acrA_S15_R1R2snps SA:Z:CP009273,4631393,+,77M224S,60,0; MQ:i:60 MC:Z:82S219M ms:i:8195

liuxianghui commented 4 years ago

I tried to the mapping using Bowtie2 to ensure the process.

bowtie2-build ref.fa ref.fa.index bowtie2 -p 2 -x ref.fa.index -1 ./../acrA_S15_R1_001.fastq -2 ./../acrA_S15_R2_001.fastq -S acrA_S15.sam samtools view -S -b /Users/xianghui.liu/work/Oishi/acrA_S15_R1R2snps/acrA_S15.sam > acrA_S15.bam & samtools index acrA_S15.bam samtools sort acrA_S15.bam -o acrA_S15.sorted.bam samtools index acrA_S15.sorted.bam imsindel --bam acrA_S15.sorted.bam --chr CP009273 --outd outx --indelsize 10000 --reffa ref.fa

Here is the output

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: acrA_S15.sorted.bam chr: CP009273 outd: outx indelsize: 10000 reffa: ref.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 acrA_S15.sorted.bam CP009273

    backward_clips: 0

    forward_clips: 0

    non_clips: 4724

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 acrA_S15.sorted.bam CP009273 mate_unmapped_read_names: 17599 samtools view -F 1024 -f 4 acrA_S15.sorted.bam CP009273 Insert size Avg: 426.3247338247338 SD: 52.13350485485183

    unmapped reads: 7955

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 0

    forward clip with support reads: 0

    non_clips with suport reads: 13

  6. considering support reads...done
  7. making consensus seqs from support reads... missing indels missing indels

    backward clip with consensus: 0 --> 0

    forward clip with consensus: 0 --> 0

    shot indel with consensus: 13 --> 2

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 0

    unpaired long indel candidates: 0

    short indel candidates: 2

    [faidx] Truncated sequence: CP009273:4626463-4636465

    all_pos: 0

    fin_indels: 0

    empty all_pos

  11. detection of indels...done

Still can't get any output. Please suggest, Here is the output of (new) BigMacMealUpsize:IMSindel xianghui.liu$ samtools view -F 1024 -f 2 acrA_S15.sorted.bam CP009273 |tail M00385:151:000000000-J3BLL:1:2106:15167:2765 147 CP009273 4631134 24 16M1D285M = 4630984 -452 TTGGCGACTGTTGTTAGGGGGGGCCGTGGGTGGGGGAGTATCCATCGTCTAATGTGTGGCAGGCGGTGATGGCATATGGCTATCAAGTACCAACATTAATACAACAGCCGGAGAAAACACATGCAACGGCAGACAAACAGTAACTGCAAGCTTGACGCGAACGAGCCATGACAGTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACGGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGGAACGAGAAACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATAT 3922029C82****:2))=:2)+*0::2***://+1++;/*95**C*==:*7>6<1,>1,44,,,4,,9DC>1,14,,;>2,=?,,71DCCB3*4*@,,;@D37,,,CEC+@EDE@,9F@,73,9,,9EF@>@D@F9CC+C7C:6++8,,599FFC,=4<<@+A<@C,@FF?F98EFFDE,9EF?<9,?:,?FB8++,B,@EE<EAFFCCF@+7EF?ECEF<F@C,F@E,9C,,:,,9E6,:F<E,EGF@CF8,6;76FEEE,F6,<EE,FGFFE@@CEGC@CCC AS:i:-110 XN:i:0 XM:i:36 XO:i:1 XG:i:1 NM:i:37 MD:Z:5T0G0A0C3C3^C0T2T1T4A0T5C3T5T3C0T0C3C2G15T0C3T8T2G16A4C5G0T0G14C4T0C12T19T36T37C7C44 YS:i:-20 YT:Z:CP M00385:151:000000000-J3BLL:1:2115:18777:6402 83 CP009273 4631141 42 301M = 4631118 -324 ACGTTCTTACTGGTGTGCCGATGGTGGCGGATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAA -FFFDFDFBDFFFGBBBDFFFGFFDGFGFGF?FFGGDCFGGGFFGGGDGGGGFCGGGDGGFGGGGGFGGGGGGFGGGGGFFGGGGGGGGGGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:301 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:2106:12000:20330 83 CP009273 4631141 42 301M = 4631028 -414 ACGTTCTTACTGGTGTGCCGATGGTGGCGGATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAA )>FFFFFF@5752)))75)?FGFEFFFGGFGGGGGGD<D<E6AEFDD=GFFGEGGGCCFCGGC?FFCFGGGGGFFFGGFGGFAFFFFGGGGFCGGGGGGGGGGE@FGGGGGGGGFGGFEEGF:9FGGDGGGGGECGGGGGGGGF@:GEGDGGGGGFFEFGGGGFE8GGGFGGDGF<FGFFGGGGGGGFGGGGGGGGGGFF?CEGF7GGFF<AGGCGGGGEEEEFGGGGFFCGGFDCFGGFGGGGGGF@7EFCFAGGGGFGGGEGGGGFGGGFGFCGECGGGFGGGGGCGGGFGFCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:301 YS:i:-30 YT:Z:CP M00385:151:000000000-J3BLL:1:2118:16501:19253 83 CP009273 4631171 42 293M2I6M = 4631087 -383 ATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAG *??>)>>53)***9*5><BC74155@3F@4FD54?*+77DB:0FCA;3<+?6?@<AGFF=F98C=5EC?5?8=FFF;C6BFE=:CC9DDCC>CFFFCCC9FAE;GFFFGFCFGFEC37FFEFD=8DADFGFFC,A,DFFGFGGF8FFCE=GFE<C,FFDCGGGGFGGGEGGEB:FGGF@DGGGFCGGGGFC?FEGEGFF9AD<DFF<CEFF@FDGGGGGGGFFF@GFCC7GGFGFF?FGGGGGFGFGGDGGGGGGGFGGFCEAGGGGGFBFGGFCGGGGGGGGGGGGGGGFFCCCCC AS:i:-26 XN:i:0 XM:i:3 XO:i:1 XG:i:2 NM:i:5 MD:Z:296T0T0C0 YS:i:-45 YT:Z:CP M00385:151:000000000-J3BLL:1:2104:17247:15174 147 CP009273 4631179 42 285M11I5M = 4631051 -418 CCGCTCAATCGGGGGAAGGGGGTGATGGTCTATTGCTATAAATTAGCAACATTAATACAACAACCGACGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCAT )52+*/1*1*4C22**)8*F9+<0**?<<*C>:++0:2+FA3+;19C::7A7C;+=67=/***/*=GGGGB=,,?FEEEEGGGGGD7?;,D5,,,?B6,DCGFFEAC:5GEGGFECEFD8,ACEAFFED;DGGDF+A>8GFEFCGFFGBEDEAA<FC,E@F<9AFGGF@EFEEFGFGGGE8F@7EGGGGFGFFA9FCCAFB6,7FDFEFB7CDGGFGGGFFGEDFDGGGFGGGFGEGGFCGGGGGGGGGGGDGFGGGGGGGGF>GEGFGGGGGFCFAAGGGGGGFGDFGGGGFDCCCCC AS:i:-63 XN:i:0 XM:i:8 XO:i:1 XG:i:11 NM:i:19 MD:Z:0T9T4C3C19C26G220T0T1 YS:i:-5 YT:Z:CP M00385:151:000000000-J3BLL:1:1119:15648:6358 83 CP009273 4631186 40 278M18I5M = 4631089 -380 ATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACT 23(1?FFB?FFF?FFFFFFFFFA4DGFABFG>AGFFFGGFFGFFBFFF<@AFDEBEFGGFFGGFFGGAGGGGFGGGGGGGGGDCGGGGGFGGGFGGGGGGGGGGGGEGGGGGFGGFGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:-74 XN:i:0 XM:i:3 XO:i:1 XG:i:18 NM:i:21 MD:Z:279T0T0T1 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:1105:11504:22141 83 CP009273 4631198 23 267M30I4M = 4631025 -444 GGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAAT 2:??F>8)F2B?49FFAF4@<84*)?AGFB@0FAFF87)5CDDFGFGFFGFC96CFC<CAC59AGGGFGGFFGFGGGCFFFCGGE7EECCGE:CGEC?CGCGGGF@CAFGGGGGFGGFGGGFGGGGGFFGGGGFCGFGF=D,FGGFFCFGGFFGEGGGGGGGFECGGGGGFFBFFEDAGFFGGDGEFGGFFGGGGFGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGFAGGFECGDFFCGGF7FFGGFGGGGGGGGGGGGFCF@GDGGGGGGGGGGFGFAGGGGGGGGGCCCCC AS:i:-113 XN:i:0 XM:i:4 XO:i:1 XG:i:30 NM:i:34 MD:Z:0C266T0T0T1 YS:i:-32 YT:Z:CP M00385:151:000000000-J3BLL:1:2115:14450:21815 83 CP009273 4631202 3 261M34I6M = 4631061 -408 GATTGTATATTGCTATCAATTAGAAACATTAATACAACAACCGGCGAAAAGGGATGCAACGGCAGAACAACATCAACGTAAAGCTTTACGCGAACGAGCCATGACATGGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGCGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGT (-51)0/0699/?@=7.:>=:0+FCCDG>5DC:35FGD?65=+4C83D<97FC<:GGGGF=8C==,,,CFA,::CCCEGGGGECD@,E9FEE89>,@ED@+GFE@F>,D,FCFGFB,FAFGGGGGGF@83@+FF7:=DA,,,5,BFEGFFF:B++4CC?GGGFC:=>++FB<FBC8GFF:,F9FEFEAGF@?C7CGGGFGGFGGFGGGGFE@EEFGFGGCF<<8FDFFC<C8FF:@EFFDCFFCGGFFFDFFEECCA8E<FGGFDF9GGFEFC,-<@7@A9F6-@CBBC AS:i:-142 XN:i:0 XM:i:12 XO:i:1 XG:i:34 NM:i:46 MD:Z:3G2C16C27T14C10T0G0C27T71A83T1T1 YS:i:-83 YT:Z:CP M00385:151:000000000-J3BLL:1:2117:25808:17593 147 CP009273 4631205 24 261M36I4M = 4631088 -382 GGTCTATTGCCAACAATTAGCAAAATTAATACAACAACCGGCGAAAAGTGATGCAAAGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGAAACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC 64C32*81+++<0++:0+:+<+:9C15C:5ED;)++?C;++@=+7C:<FF?CFCA9CCEC?FFC?9?,7:?=**EEDEGCCCC@,869A<7@6A<,D@+6D@7@,,,<?DDCFD?FFEBGFA;FFDDBDEEGGCE9GGGFFGF@,ECEE7GFFC=GGFCAGGFFE?9F8GCGADF?,GGFCFGGFGGFCFGGGGGGGGFGGGGGGGFEFGGGGGGGGFFF<96,EE,9E@<CF7:GGDGGCGGGGGGGDGGGFCGGGGGGGFFFAGGGGGGFGGGGGDFGGGGGGGCCCC< AS:i:-133 XN:i:0 XM:i:6 XO:i:1 XG:i:36 NM:i:42 MD:Z:10T1T10C32C129C75T2 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:2116:26509:8748 147 CP009273 4631216 8 250M47I4M = 4631078 -392 ATCAATTCGCAACATTAAAACACCCACCGGCGAAAAGGGATGCAACGGCAGACCAACCTCCACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGGGCAACGAGACACGGCAATGTTGAACCGTTTGCTGNATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTA +C?;/29+07977F:?E:)*DGEDC<9<+:+:+:9:C714CC<41C;/82++?/*:+FC75??:*5C*??*6;7FC:,2F>?CF>7@8CA9D8FFGGGFGGGFCGGEECCE,7E3>FC>+GEGGGA=;:AD@D:F@+B7DE@?@4,DGGFEA9,:<A+FGCFEEFEGEFA,EAFFB,4=GFFFGFFA:#GGF9GCGGF@DGF9GFGGFF8GFC,9E@@@@@,GGGFEC9GGGFGGGGGF9GGGGFFFFFDFFAGGGGGGGF@FFFA@,F9GEGGGGGGEGGGGGCCCCC AS:i:-179 XN:i:0 XM:i:12 XO:i:1 XG:i:47 NM:i:59 MD:Z:7A10T3A1A12T19A2A104A22C11C49T2C0 YS:i:-6 YT:Z:CP (new) BigMacMealUpsize:IMSindel xianghui.liu$ samtools view -F 1024 -f 2 acrA_S15.sorted.bam CP009273 |tail -10 M00385:151:000000000-J3BLL:1:2106:15167:2765 147 CP009273 4631134 24 16M1D285M = 4630984 -452 TTGGCGACTGTTGTTAGGGGGGGCCGTGGGTGGGGGAGTATCCATCGTCTAATGTGTGGCAGGCGGTGATGGCATATGGCTATCAAGTACCAACATTAATACAACAGCCGGAGAAAACACATGCAACGGCAGACAAACAGTAACTGCAAGCTTGACGCGAACGAGCCATGACAGTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACGGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGGAACGAGAAACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATAT 3922029C82****:2))=:2)+*0::2***://+1++;/*95**C*==:*7>6<1,>1,44,,,4,,9DC>1,14,,;>2,=?,,71DCCB3*4*@,,;@D37,,,CEC+@EDE@,9F@,73,9,,9EF@>@D@F9CC+C7C:6++8,,599FFC,=4<<@+A<@C,@FF?F98EFFDE,9EF?<9,?:,?FB8++,B,@EE<EAFFCCF@+7EF?ECEF<F@C,F@E,9C,,:,,9E6,:F<E,EGF@CF8,6;76FEEE,F6,<EE,FGFFE@@CEGC@CCC AS:i:-110 XN:i:0 XM:i:36 XO:i:1 XG:i:1 NM:i:37 MD:Z:5T0G0A0C3C3^C0T2T1T4A0T5C3T5T3C0T0C3C2G15T0C3T8T2G16A4C5G0T0G14C4T0C12T19T36T37C7C44 YS:i:-20 YT:Z:CP M00385:151:000000000-J3BLL:1:2115:18777:6402 83 CP009273 4631141 42 301M = 4631118 -324 ACGTTCTTACTGGTGTGCCGATGGTGGCGGATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAA -FFFDFDFBDFFFGBBBDFFFGFFDGFGFGF?FFGGDCFGGGFFGGGDGGGGFCGGGDGGFGGGGGFGGGGGGFGGGGGFFGGGGGGGGGGGFCGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:301 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:2106:12000:20330 83 CP009273 4631141 42 301M = 4631028 -414 ACGTTCTTACTGGTGTGCCGATGGTGGCGGATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAA )>FFFFFF@5752)))75)?FGFEFFFGGFGGGGGGD<D<E6AEFDD=GFFGEGGGCCFCGGC?FFCFGGGGGFFFGGFGGFAFFFFGGGGFCGGGGGGGGGGE@FGGGGGGGGFGGFEEGF:9FGGDGGGGGECGGGGGGGGF@:GEGDGGGGGFFEFGGGGFE8GGGFGGDGF<FGFFGGGGGGGFGGGGGGGGGGFF?CEGF7GGFF<AGGCGGGGEEEEFGGGGFFCGGFDCFGGFGGGGGGF@7EFCFAGGGGFGGGEGGGGFGGGFGFCGECGGGFGGGGGCGGGFGFCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:301 YS:i:-30 YT:Z:CP M00385:151:000000000-J3BLL:1:2118:16501:19253 83 CP009273 4631171 42 293M2I6M = 4631087 -383 ATTATCCTTCGCTCAATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAG *??>)>>53)***9*5><BC74155@3F@4FD54?*+77DB:0FCA;3<+?6?@<AGFF=F98C=5EC?5?8=FFF;C6BFE=:CC9DDCC>CFFFCCC9FAE;GFFFGFCFGFEC37FFEFD=8DADFGFFC,A,DFFGFGGF8FFCE=GFE<C,FFDCGGGGFGGGEGGEB:FGGF@DGGGFCGGGGFC?FEGEGFF9AD<DFF<CEFF@FDGGGGGGGFFF@GFCC7GGFGFF?FGGGGGFGFGGDGGGGGGGFGGFCEAGGGGGFBFGGFCGGGGGGGGGGGGGGGFFCCCCC AS:i:-26 XN:i:0 XM:i:3 XO:i:1 XG:i:2 NM:i:5 MD:Z:296T0T0C0 YS:i:-45 YT:Z:CP M00385:151:000000000-J3BLL:1:2104:17247:15174 147 CP009273 4631179 42 285M11I5M = 4631051 -418 CCGCTCAATCGGGGGAAGGGGGTGATGGTCTATTGCTATAAATTAGCAACATTAATACAACAACCGACGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCAT )52+*/1*1*4C22**)8*F9+<0**?<<*C>:++0:2+FA3+;19C::7A7C;+=67=/***/*=GGGGB=,,?FEEEEGGGGGD7?;,D5,,,?B6,DCGFFEAC:5GEGGFECEFD8,ACEAFFED;DGGDF+A>8GFEFCGFFGBEDEAA<FC,E@F<9AFGGF@EFEEFGFGGGE8F@7EGGGGFGFFA9FCCAFB6,7FDFEFB7CDGGFGGGFFGEDFDGGGFGGGFGEGGFCGGGGGGGGGGGDGFGGGGGGGGF>GEGFGGGGGFCFAAGGGGGGFGDFGGGGFDCCCCC AS:i:-63 XN:i:0 XM:i:8 XO:i:1 XG:i:11 NM:i:19 MD:Z:0T9T4C3C19C26G220T0T1 YS:i:-5 YT:Z:CP M00385:151:000000000-J3BLL:1:1119:15648:6358 83 CP009273 4631186 40 278M18I5M = 4631089 -380 ATCTGGGGCAGGCGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACT 23(1?FFB?FFF?FFFFFFFFFA4DGFABFG>AGFFFGGFFGFFBFFF<@AFDEBEFGGFFGGFFGGAGGGGFGGGGGGGGGDCGGGGGFGGGFGGGGGGGGGGGGEGGGGGFGGFGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:-74 XN:i:0 XM:i:3 XO:i:1 XG:i:18 NM:i:21 MD:Z:279T0T0T1 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:1105:11504:22141 83 CP009273 4631198 23 267M30I4M = 4631025 -444 GGGTGATGGTCTATTGCTATCAATTAGCAACATTAATACAACAACCGGCGAAAAGTGATGCAACGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAAT 2:??F>8)F2B?49FFAF4@<84*)?AGFB@0FAFF87)5CDDFGFGFFGFC96CFC<CAC59AGGGFGGFFGFGGGCFFFCGGE7EECCGE:CGEC?CGCGGGF@CAFGGGGGFGGFGGGFGGGGGFFGGGGFCGFGF=D,FGGFFCFGGFFGEGGGGGGGFECGGGGGFFBFFEDAGFFGGDGEFGGFFGGGGFGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGFAGGFECGDFFCGGF7FFGGFGGGGGGGGGGGGFCF@GDGGGGGGGGGGFGFAGGGGGGGGGCCCCC AS:i:-113 XN:i:0 XM:i:4 XO:i:1 XG:i:30 NM:i:34 MD:Z:0C266T0T0T1 YS:i:-32 YT:Z:CP M00385:151:000000000-J3BLL:1:2115:14450:21815 83 CP009273 4631202 3 261M34I6M = 4631061 -408 GATTGTATATTGCTATCAATTAGAAACATTAATACAACAACCGGCGAAAAGGGATGCAACGGCAGAACAACATCAACGTAAAGCTTTACGCGAACGAGCCATGACATGGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGCGCAACGAGACACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGT (-51)0/0699/?@=7.:>=:0+FCCDG>5DC:35FGD?65=+4C83D<97FC<:GGGGF=8C==,,,CFA,::CCCEGGGGECD@,E9FEE89>,@ED@+GFE@F>,D,FCFGFB,FAFGGGGGGF@83@+FF7:=DA,,,5,BFEGFFF:B++4CC?GGGFC:=>++FB<FBC8GFF:,F9FEFEAGF@?C7CGGGFGGFGGFGGGGFE@EEFGFGGCF<<8FDFFC<C8FF:@EFFDCFFCGGFFFDFFEECCA8E<FGGFDF9GGFEFC,-<@7@A9F6-@CBBC AS:i:-142 XN:i:0 XM:i:12 XO:i:1 XG:i:34 NM:i:46 MD:Z:3G2C16C27T14C10T0G0C27T71A83T1T1 YS:i:-83 YT:Z:CP M00385:151:000000000-J3BLL:1:2117:25808:17593 147 CP009273 4631205 24 261M36I4M = 4631088 -382 GGTCTATTGCCAACAATTAGCAAAATTAATACAACAACCGGCGAAAAGTGATGCAAAGGCAGACCAACATCAACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGAGCAACGAGAAACGGCAATGTTGCACCGTTTGCTGCATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC 64C32*81+++<0++:0+:+<+:9C15C:5ED;)++?C;++@=+7C:<FF?CFCA9CCEC?FFC?9?,7:?=**EEDEGCCCC@,869A<7@6A<,D@+6D@7@,,,<?DDCFD?FFEBGFA;FFDDBDEEGGCE9GGGFFGF@,ECEE7GFFC=GGFCAGGFFE?9F8GCGADF?,GGFCFGGFGGFCFGGGGGGGGFGGGGGGGFEFGGGGGGGGFFF<96,EE,9E@<CF7:GGDGGCGGGGGGGDGGGFCGGGGGGGFFFAGGGGGGFGGGGGDFGGGGGGGCCCC< AS:i:-133 XN:i:0 XM:i:6 XO:i:1 XG:i:36 NM:i:42 MD:Z:10T1T10C32C129C75T2 YS:i:0 YT:Z:CP M00385:151:000000000-J3BLL:1:2116:26509:8748 147 CP009273 4631216 8 250M47I4M = 4631078 -392 ATCAATTCGCAACATTAAAACACCCACCGGCGAAAAGGGATGCAACGGCAGACCAACCTCCACTGCAAGCTTTACGCGAACGAGCCATGACATTGCTGACGACTCTGGCAGTGGCAGATGACATAAAACTGGTCGACTGGTTACAACAACGCCTGGGGCTTTTAGGGCAACGAGACACGGCAATGTTGAACCGTTTGCTGNATGATATTGAAAAAAATATCACCAAATAAAAAACGCCTTAGTAAGTATTTTTCAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTA +C?;/29+07977F:?E:)*DGEDC<9<+:+:+:9:C714CC<41C;/82++?/*:+FC75??:*5C*??*6;7FC:,2F>?CF>7@8CA9D8FFGGGFGGGFCGGEECCE,7E3>FC>+GEGGGA=;:AD@D:F@+B7DE@?@4,DGGFEA9,:<A+FGCFEEFEGEFA,EAFFB,4=GFFFGFFA:#GGF9GCGGF@DGF9GFGGFF8GFC,9E@@@@@,GGGFEC9GGGFGGGGGF9GGGGFFFFFDFFAGGGGGGGF@FFFA@,F9GEGGGGGGEGGGGGCCCCC AS:i:-179 XN:i:0 XM:i:12 XO:i:1 XG:i:47 NM:i:59 MD:Z:7A10T3A1A12T19A2A104A22C11C49T2C0 YS:i:-6 YT:Z:CP

holrock commented 4 years ago

You can use this following bam file: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00154/alignment/HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam

You should use bwa-mem for read mapping with the command line bwa mem ref.fa fastq1 fastq2.

As IMSindel has not been applied for bacterial genomes, we could not recommend to use our software for the genomes.

liuxianghui commented 4 years ago

Thanks, where is the reference file?

holrock commented 4 years ago

For example, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

liuxianghui commented 4 years ago

error message as this:

(new) BigMacMealUpsize:IMSindel xianghui.liu$ imsindel --bam HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam --chr 20 --outd outy --indelsize 10000 --reffa hs37d5.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam chr: 20 outd: outy indelsize: 10000 reffa: hs37d5.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    backward_clips: 985

    forward_clips: 907

    non_clips: 19635

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 mate_unmapped_read_names: 33839 samtools view -F 1024 -f 4 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 Insert size Avg: 480.6245175519206 SD: 16.16832684969753

    unmapped reads: 6859

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 31

    forward clip with support reads: 30

    non_clips with suport reads: 2298

  6. considering support reads...done
  7. making consensus seqs from support reads...

    backward clip with consensus: 31 --> 23

    forward clip with consensus: 30 --> 24

    shot indel with consensus: 2298 --> 580

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 1

    unpaired long indel candidates: 9

    short indel candidates: 580

    all_pos: 523

    fin_indels: 523

    samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    all_pos_depth 523

    Traceback (most recent call last): 7: from /opt/anaconda3/envs/new/bin/imsindel:44:in <main>' 6: from /opt/anaconda3/envs/new/lib/ims_indel.rb:34:inrun' 5: from /opt/anaconda3/envs/new/lib/ims_indel.rb:108:in run_detect_indels' 4: from /opt/anaconda3/envs/new/lib/ims_indel/indel_detector.rb:29:inindel_call' 3: from /opt/anaconda3/envs/new/lib/ims_indel/indel_detector.rb:321:in final_indel_call' 2: from /opt/anaconda3/envs/new/lib/ruby/2.6.0/open-uri.rb:37:inopen' 1: from /opt/anaconda3/envs/new/lib/ruby/2.6.0/open-uri.rb:37:in open' /opt/anaconda3/envs/new/lib/ruby/2.6.0/open-uri.rb:37:ininitialize': No such file or directory @ rb_sysopen - outy/20.out (Errno::ENOENT)

holrock commented 4 years ago

Could you find anything in the output directory?

liuxianghui commented 4 years ago

can not. No output directory

holrock commented 4 years ago

Does outout directory exist?

liuxianghui commented 4 years ago

No... no new directory and file created under current directory

I tried to run it on my HPC cluster. The error is similar.

(im) [xianghui@ln-0001 IMSindel]$ imsindel --bam HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam --chr 20 --outd outy --indelsize 10000 --reffa hs37d5.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam chr: 20 outd: outy indelsize: 10000 reffa: hs37d5.fa glsearch: glsearch36 glsearch mat: /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /tmp thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    backward_clips: 985

    forward_clips: 907

    non_clips: 19635

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 mate_unmapped_read_names: 33839 samtools view -F 1024 -f 4 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 Insert size Avg: 480.6245175519206 SD: 16.16832684969753

    unmapped reads: 6859

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 31

    forward clip with support reads: 30

    non_clips with suport reads: 2298

  6. considering support reads...done
  7. making consensus seqs from support reads...

    backward clip with consensus: 31 --> 23

    forward clip with consensus: 30 --> 24

    shot indel with consensus: 2298 --> 580

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 1

    unpaired long indel candidates: 9

    short indel candidates: 580

    all_pos: 523

    fin_indels: 523

    samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    all_pos_depth 523

    Traceback (most recent call last): 7: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/bin/imsindel:44:in <main>' 6: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ims_indel.rb:34:inrun' 5: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ims_indel.rb:108:in run_detect_indels' 4: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ims_indel/indel_detector.rb:29:inindel_call' 3: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ims_indel/indel_detector.rb:321:in final_indel_call' 2: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ruby/2.6.0/open-uri.rb:37:inopen' 1: from /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ruby/2.6.0/open-uri.rb:37:in open' /gpfs0/scratch/xianghui/software/anaconda3/envs/im/lib/ruby/2.6.0/open-uri.rb:37:ininitialize': No such file or directory @ rb_sysopen - outy/20.out (Errno::ENOENT) (im) [xianghui@ln-0001 IMSindel]$

Is it problem with installation of ruby?

liuxianghui commented 4 years ago

What the output shall look like?

holrock commented 4 years ago

You must create output directory beforehand.

liuxianghui commented 4 years ago

thank you. After create directory, for sample dataset running is ok. (new) BigMacMealUpsize:IMSindel xianghui.liu$ mkdir outy (new) BigMacMealUpsize:IMSindel xianghui.liu$ imsindel --bam HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam --chr 20 --outd outy --indelsize 10000 --reffa hs37d5.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam chr: 20 outd: outy indelsize: 10000 reffa: hs37d5.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    backward_clips: 985

    forward_clips: 907

    non_clips: 19635

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 mate_unmapped_read_names: 33839 samtools view -F 1024 -f 4 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20 Insert size Avg: 480.6245175519206 SD: 16.16832684969753

    unmapped reads: 6859

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 31

    forward clip with support reads: 30

    non_clips with suport reads: 2298

  6. considering support reads...done
  7. making consensus seqs from support reads...

    backward clip with consensus: 31 --> 23

    forward clip with consensus: 30 --> 24

    shot indel with consensus: 2298 --> 580

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 1

    unpaired long indel candidates: 9

    short indel candidates: 580

    all_pos: 523

    fin_indels: 523

    samtools view -F 1024 -f 2 HG00154.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam 20

    all_pos_depth 523

  11. detection of indels...done
liuxianghui commented 4 years ago

However, for mine, still no output. (new) BigMacMealUpsize:IMSindel xianghui.liu$ mkdir outx (new) BigMacMealUpsize:IMSindel xianghui.liu$ imsindel --bam acrA_S15.sorted.bam --chr CP009273 --outd outx --indelsize 10000 --reffa ref.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: acrA_S15.sorted.bam chr: CP009273 outd: outx indelsize: 10000 reffa: ref.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 acrA_S15.sorted.bam CP009273

    backward_clips: 0

    forward_clips: 0

    non_clips: 4724

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 acrA_S15.sorted.bam CP009273 mate_unmapped_read_names: 17599 samtools view -F 1024 -f 4 acrA_S15.sorted.bam CP009273 Insert size Avg: 426.3247338247338 SD: 52.13350485485183

    unmapped reads: 7955

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 0

    forward clip with support reads: 0

    non_clips with suport reads: 13

  6. considering support reads...done
  7. making consensus seqs from support reads... missing indels missing indels

    backward clip with consensus: 0 --> 0

    forward clip with consensus: 0 --> 0

    shot indel with consensus: 13 --> 2

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 0

    unpaired long indel candidates: 0

    short indel candidates: 2

    [faidx] Truncated sequence: CP009273:4626463-4636465

    all_pos: 0

    fin_indels: 0

    empty all_pos

holrock commented 4 years ago

uhmm.., there is no soft clip in that file. IMSindel uses soft clips to detect intermediate-size indels. I'm sorry, I can not recommend to use the IMSindel for bacterial genomes.

liuxianghui commented 4 years ago

what is soft clip? Is this depended on the assembler used. Mine is from bowtie2 but you recommend bwa. Maybe I shall give a try.

Besides, could you kindly recommend a popular tool for detecting large structure variants. I am not quite familiar with this area. snippy only detect SNPs, I am more interested in larger variants that affect e.g, gene deletion. gatk seems to a bit old.

Moreover, what are usual tools for you to visualize those results and directly link to the gene affected?. Seems that results are in vcf files. Is Unipro UGene good for that?

On Sun, Aug 30, 2020 at 5:38 PM holrock notifications@github.com wrote:

uhmm.., your bam doesn't have soft clips. IMSSindel using soft clips for detect indels. I'm sorry, I not recommend use imsidel for bacterial genome.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NCGG-MGC/IMSindel/issues/14#issuecomment-683398778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFTHY5N6NNVLF6BQFAF3BLSDIM2DANCNFSM4QLYP4SQ .

liuxianghui commented 4 years ago

I tried deepSV but can't make it work....

Thank you very much for your help. Please kindly suggest a good tool you prefered, ... Bacteria are much simpler than Human.... Should be much easier....

On Sun, Aug 30, 2020 at 7:42 PM Xianghui Liu yosemity@gmail.com wrote:

what is soft clip? Is this depended on the assembler used. Mine is from bowtie2 but you recommend bwa. Maybe I shall give a try.

Besides, could you kindly recommend a popular tool for detecting large structure variants. I am not quite familiar with this area. snippy only detect SNPs, I am more interested in larger variants that affect e.g, gene deletion. gatk seems to a bit old.

Moreover, what are usual tools for you to visualize those results and directly link to the gene affected?. Seems that results are in vcf files. Is Unipro UGene good for that?

On Sun, Aug 30, 2020 at 5:38 PM holrock notifications@github.com wrote:

uhmm.., your bam doesn't have soft clips. IMSSindel using soft clips for detect indels. I'm sorry, I not recommend use imsidel for bacterial genome.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NCGG-MGC/IMSindel/issues/14#issuecomment-683398778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFTHY5N6NNVLF6BQFAF3BLSDIM2DANCNFSM4QLYP4SQ .

holrock commented 4 years ago

Please check this link: https://drive5.com/usearch/manual/cigar.html, where soft clips was mentioned.

Scanindel can also detect intemediate-size indels.

liuxianghui commented 4 years ago

Ran with bwa for mapping, result is different but still no output result.... How do you find out the info about soft clip? (new) BigMacMalUpsize:IMSindel xianghui.liu$ imsindel --bam aln-pe.sorted.bam --chr CP009273 --outd outx --indelsize 10000 --reffa ref.fa

Parameters: Avg. base quality: 20 Maping quality: 20 Read group: within 3bp paired B and F: within 5bp Support reads for making consensus sequence: 3 mimimum clipping fragment base: 5bp support clip length: 5bp bam: aln-pe.sorted.bam chr: CP009273 outd: outx indelsize: 10000 reffa: ref.fa glsearch: glsearch36 glsearch mat: /opt/anaconda3/envs/new/lib/../data/mydna.mat mafft: mafft samtools: samtools temp: /var/folders/0m/d8qkfjp918s79vdk51n_8v_h0000gs/T thread: 1 exclude-region:

  1. collecting indel related reads... samtools view -F 1024 -f 2 aln-pe.sorted.bam CP009273

    backward_clips: 32222

    forward_clips: 32237

    non_clips: 4924

  2. collecting indel related reads...done
  3. collecting unmapped reads... samtools view -F 1024 -f 8 aln-pe.sorted.bam CP009273 mate_unmapped_read_names: 2016 samtools view -F 1024 -f 4 aln-pe.sorted.bam CP009273 Insert size Avg: 248.07170235453472 SD: 103.21198413514173

    unmapped reads: 110

  4. collecting unmapped reads...done
  5. considering support reads...

    backward clip with support reads: 34

    forward clip with support reads: 29

    non_clips with suport reads: 64

  6. considering support reads...done
  7. making consensus seqs from support reads...

    backward clip with consensus: 34 --> 9

    forward clip with consensus: 29 --> 10

    shot indel with consensus: 64 --> 0

  8. making consensus seqs from support reads...done
  9. making consensus seq from B and F.. making consensus seq for long deletion...done
  10. detection of indels...

    paired long indel candidates: 0

    unpaired long indel candidates: 6

    short indel candidates: 0

    all_pos: 0

    fin_indels: 0

    empty all_pos

  11. detection of indels...done
holrock commented 4 years ago

It seems to have finished with no problems and IMSindel was unable to detect the indels. We can't do this anymore.