JialiUMassWengLab / TEMP

TEMP is a software package for detecting transposable elements (TEs) insertions and excisions from pooled high-throughput sequencing data
GNU General Public License v2.0
20 stars 19 forks source link

Invalid record in bam.unproper.uniq.interval.bed #12

Open rimjhimroy opened 7 years ago

rimjhimroy commented 7 years ago

Hi,

I get this error when running "TEMP_Absence.sh":

bedtools intersect -a /cluster/project/gdc/people/crimjhim/TEPID_final.bed.sort -b merged.Ma99.bam.unproper.uniq.interval.bed -f 1.0 -wo
Error: Invalid record in file merged.Ma99.bam.unproper.uniq.interval.bed. Record is
chr1    2287009 2287008 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784/2

Any idea why the coordinates are inverted in the bed file, and how should I fix this? I am working with pair-end Illumina-seq and the average insert size is 250 bp.

Thanks, Rimjhim

JialiUMassWengLab commented 7 years ago

Hi Rimjhim,

It's hard to know exactly what happened without knowing anything about "merged.Ma99.bam". Would you mind posting a few entries (including read "HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784/2") from the BAM file?

Jiali

rimjhimroy commented 7 years ago

Hi Jiali,

Thank you very much for your reply, and I am sorry I should have added more details.

Here is a snippet from the merged.Ma99.bam from chr1:2286800-2287050

  1 HWI-700523F:21:C6KJ9ANXX:4:1203:13109:25753     163     chr1    2286830 22      77S49M  =       2286868 203     TTTGAAGCAAACAGATATGTCACCGAAAGGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATC    AATTTATTAACCAAGAAAGAGATCTGAATCGTAACATG  BBBBBGGGFD>FEBDGDGGGGFGG/<>CDGGGGGGGGEBFCBGGGGGGGGEDGGGGGGGGGGGGGGGGGEB@GGGGGGGGGGGGEFFGGFGGGGGGGGG@FGGGBFGGGGFGGGGGGCBFEGGGGF  AS:i:56 XN:i:0  XM:i:6  XO:i:0      XG:i:0  NM:i:6  MD:Z:16C2C1T14A0G6C4    YS:i:104        YT:Z:CP
  2 HWI-700523F:21:C6KJ9ANXX:4:2205:5836:73918      145     chr1    2286830 36      45S53M28S       =       2286577 -351    TATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATC    TGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTT  5DFGGGGGGG=40GGGGC@GEGGGDGGBGGGGGGGGGGBFGGF@FGGFF@GGGGD@GGFGGGGGFGCEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGA<ABB  AS:i:64 XN:i:0  XM:i:6      XO:i:0  XG:i:0  NM:i:6  MD:Z:16C2C1T14A0G6C8    YS:i:252        YT:Z:DP
  3 HWI-D00418:56:C6KLUANXX:8:2102:8126:85711       83      chr1    2286830 22      37S64M1I23M1S   =       2286699 -255    GGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGT    AACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAA  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB  AS:i:103        XN:i:0      XM:i:9  XO:i:1  XG:i:1  NM:i:10 MD:Z:16C2C1T14A0G6C8A1G23C7     YS:i:68 YT:Z:CP
  4 HWI-D00418:56:C6KLUANXX:8:2107:16542:25469      145     chr1    2286832 41      62M1I42M21S     =       2286623 -313    ACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTA    GATTCAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTA  FBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB  AS:i:123        XN:i:0      XM:i:11 XO:i:1  XG:i:1  NM:i:12 MD:Z:14C2C1T14A0G6C8A1G23C7G14A3        YS:i:220        YT:Z:DP
  5 HWI-700523F:21:C6KJ9ANXX:4:2205:11059:54702     69      chr1    2286852 0       *       =       2286852 0       TTTGTAAGATGATCAAAAACAGGAATATCTGAGAAGCTTGTAAACATATGAACAGTGAACTTTGAAGCAAACAGATATGTCACCAAAA    GGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACAC  CCCCCGGFGGGGGGEGGGFGGGGGGGGGGGG1><DGCEGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBFG@GGGGCGGGGEGGGGGGGGGGGGGGGGGGGFGG0  YT:Z:UP
  6 HWI-D00418:56:C6KLUANXX:8:1101:13471:25839      161     chr1    2286852 36      7S42M1I50M1D8M18S       =       2287239 509     TATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTC    AACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGT  BBBBBFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFF<  AS:i:121            XN:i:0  XM:i:9  XO:i:2  XG:i:2  NM:i:11 MD:Z:14A0G6C8A1G23C7G14A3G7^T8  YS:i:205        YT:Z:DP
  7 HWI-700523F:21:C6KJ9ANXX:4:2205:11059:54702     153     chr1    2286852 24      8S42M1I50M1D8M17S       =       2286852 0       TTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATT    CAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGG  @GGGCFEGGGCGGGGGGGGGGEC@FDGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGBGGGGGGGGGGFGBBBBB  AS:i:121            XN:i:0  XM:i:9  XO:i:2  XG:i:2  NM:i:11 MD:Z:14A0G6C8A1G23C7G14A3G7^T8  YT:Z:UP
  8 HWI-700523F:21:C6KJ9ANXX:4:1203:13109:25753     83      chr1    2286868 22      1S26M1I50M1D8M40S       =       2286830 -203    AATCGTAACATGAAAGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAAC    CCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAAC  =GD.F@@C0B;>F<000F>FGCGGDB0C>D@FCGFFDB:0DCGGGFCGEGGGGGGGFF>GGGE<=11<DF>F>GF1@BC1CGGGGGGGGGGGCGGGF@CGEF>E@>GGGGGGGGGGGCGGGBBBBA  AS:i:104            XN:i:0  XM:i:8  XO:i:2  XG:i:2  NM:i:10 MD:Z:6C6T1A1G23C7G14A3G7^T8     YS:i:56 YT:Z:CP
  9 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784     161     chr1    2286876 36      18M1I50M1D7M2D3M1D1M3D1M5D17M2D28M      =       2287003 252     ATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAA    GTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAA  BBA=?FGG>GD@F=BDFFEGG1CGGGGFGCGCEC1FGGG>1EGGGGGGGGGGGDFGGGGGGGCBFGGDD0FFC00FGGCFGGGGDGD000=FFG@0:0FB@007CF@>@FC@F?CFG>:F4BA@C=      AS:i:99 XN:i:0  XM:i:11 XO:i:7  XG:i:15 NM:i:26 MD:Z:7A1G23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A23      YS:i:201        YT:Z:DP
 10 HWI-700523F:21:C6KJ9ANXX:4:2211:13774:52164     97      chr1    2286886 28      2S8M1I50M1D7M2D3M1D1M3D1M5D17M2D36M     =       2287159 393     ACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACC    CTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAATATCCTCA  @BBCC@D=EB>BCDF;ED:11EGGDG>DD1:<FEFB1EDGFGBGGCB1FFGG@FFGGDFGFEGDBDGGG@EFB1FEGG1BC>D0FG00FGGCDE0;F0E@G>CFGG0<F>@FGCD@C0CFFFGGG8      AS:i:99 XN:i:0  XM:i:10 XO:i:7  XG:i:15 NM:i:25 MD:Z:23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A27A3        YS:i:194        YT:Z:DP
 11 HWI-D00418:56:C6KLUANXX:8:2109:9866:47127       69      chr1    2286992 0       *       =       2286992 0       ATGATCAAAAACAGGAATATCTGAGAAGCTTGTAAACATATGAACAGTGAACTTTGAAGCAAACAGATATGTCACCAAAAGGGCTATT    AAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTC  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  YT:Z:UP
 12 HWI-D00418:56:C6KLUANXX:8:2109:9866:47127       153     chr1    2286992 42      87M1I4M1I10M1D22M       =       2286992 0       AAATCAGATCTAACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAG    AAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCG   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB   AS:i:194            XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:27A60G7G4^C1A20    YT:Z:UP
 13 HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784     81      chr1    2287003 36      76M1I4M1I10M1D34M       =       2286876 -252    AACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGA    TACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAAT  6:000090@0700000;0000=0800808<00C=0/=E:000=0/>E/EDC0F1@DGGGC@CF=1DE00CF@F:<GGF1:BF>G@GF1F>F11F1EGGG>CGGGGCGBGGGGGGGGGGF@CBBCB@  AS:i:201            XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:16A60G7G4^C1A32    YS:i:99 YT:Z:DP
 14 HWI-D00418:56:C6KLUANXX:8:2302:14361:100328     69      chr1    2287016 0       *       =       2287016 0       ACACAATGTGTCCTTAAACTTGAATCAATTTATTAACCAAGAAAGAGATCTGAATCGTAACATGAATGCACAAAGTACTAAAAAAATC    AAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACCC  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  YT:Z:UP
 15 HWI-D00418:56:C6KLUANXX:8:2302:14361:100328     153     chr1    2287016 37      63M1I4M1I10M1D46M       =       2287016 0       ATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAG    GATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAG   FFFFBFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB   AS:i:187            XS:i:61 XN:i:0  XM:i:5  XO:i:3  XG:i:3  NM:i:8  MD:Z:3A60G7G4^C1A40A3   YT:Z:UP
 16 HWI-700523F:21:C6KJ9ANXX:4:2206:19732:85187     133     chr1    2287024 0       *       =       2287024 0       AAACAGATATGTCACCAAAAGGGCTATTAAAAGGCTCAAAAGCAGAGATAACAAACACAATGTGTCCTTAAACTTGAATCAATTTATT    AACCAAGAAAGAGATCTGAATCGTAACATGAATGCACA  ?AA@BBGGGGGG>GGGGGG>1FDGGGGG1FGGGGGGG1FGGGFGGGGGGGGGGGBGGGGBC@FGGGGGGGGGGGGGGGCGGGGGGGGGGGGEDGBGGGGGEGF0FFGCF@FFFGGGCG0CGGGEGF  YT:Z:UP
 17 HWI-700523F:21:C6KJ9ANXX:4:2206:19732:85187     89      chr1    2287024 25      55M1I4M1I10M1D55M       =       2287024 0       AGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAA    GAAACTCACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAGAGAACTCTT  GFD@9C<<008CC0.8C>FC;0DE9/F=00CGGGGDF0D@GGGFE@GEGGFGGGGGGGFEEGDFFGGGGGGGGGFEF@GGEF:>>GBGGFBGGGGGGEGGGCGGFFEF;GGDGFGFC1CGDA=CCB  AS:i:196            XS:i:80 XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:56G7G4^C1A40A12    YT:Z:UP
 18 HWI-D00418:56:C6KLUANXX:8:1308:13410:70628      97      chr1    2287031 21      48M1I4M1I10M1D62M       =       2287182 0       ATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTC    ACATCTGCCAAGCGGAGAGGATGAATAGAGAAGCGAAGAGAACTCTTCCAAGAA  BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  AS:i:189            XS:i:93 XN:i:0  XM:i:5  XO:i:3  XG:i:3  NM:i:8  MD:Z:49G7G4^C1A40A14G4  YT:Z:UP

Where the read HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784 is:

HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784 161 chr1    2286876 36  18M1I50M1D7M2D3M1D1M3D1M5D17M2D28M  =   2287003 252 ATGAATGCACAAAGTACTAAAAAAATCAAGCTTTTAGATTCAACAAAAGGAATCAAGTCAAACCCTAGATTGATTTACCCTAGATATGCTAAGGTTCTAATTCAAATCAGATCTAACCTAATAGAA  BBA=?FGG>GD@F=BDFFEGG1CGGGGFGCGCEC1FGGG>1EGGGGGGGGGGGDFGGGGGGGCBFGGDD0FFC00FGGCFGGGGDGD000=FFG@0:0FB@007CF@>@FC@F?CFG>:F4BA@C=  AS:i:99 XN:i:0  XM:i:11 XO:i:7  XG:i:15 NM:i:26 MD:Z:7A1G23C7G14A3G7^T7^AG3^A1^AAA1^CAAAC2T1T12^CA2A0C0A23  YS:i:201    YT:Z:DP
HWI-700523F:21:C6KJ9ANXX:4:2301:14610:90784 81  chr1    2287003 36  76M1I4M1I10M1D34M   =   2286876 -252    AACCTAATAGAATATCCTCAAAGAAGAGATCTAAACGAAACCCTAGTCCGTGAAAACAGAGAAACAGATCGATACGAAAAGAGAGGATGAAAAGAAACTCACATCTGCCAAGCGGAGAGGATGAAT  6:000090@0700000;0000=0800808<00C=0/=E:000=0/>E/EDC0F1@DGGGC@CF=1DE00CF@F:<GGF1:BF>G@GF1F>F11F1EGGG>CGGGGCGBGGGGGGGGGGF@CBBCB@  AS:i:201    XN:i:0  XM:i:4  XO:i:3  XG:i:3  NM:i:7  MD:Z:16A60G7G4^C1A32    YS:i:99 YT:Z:DP

Please let me know if you need more lines.

Thanks,

Rimjhim

JialiUMassWengLab commented 7 years ago

Rimjhim,

This is caused by having very long reads and the two reads actually overlap.

I've modified the code and it should be taken care of. Let me know if it still doesn't work.

Jiali