Closed GoogleCodeExporter closed 9 years ago
Sorry my reply has taken so long. Could you please provide the full untrimmed
read of the one read that gets trimmed to GTC? I would also need the full
command line in order to reproduce the problem.
Original comment by marcel.m...@tu-dortmund.de
on 9 Mar 2012 at 10:22
Marcel - see also the email I sent this morning.
Here is the example I posted on the cutadapt issues page:
Running cutadapt within a perl script, within Sun Grid Engine.
$fastq is the input fastq.gz and the sample name is also in the output file.
I have cut/pasted the $adapters file below.
regards and thanks for help, david
perl script:
system ( "/data_n2/hmw208/software/cutadapt-1.0/cutadapt --overlap=15 --times=1
--quality-base=64 $adapters $fastq1 -o ./interim/${sample}.fastq1.cutadapt.gz"
);
input fastq.gz
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAGT
CCAGCACAAGGTTCCCACTGTACCCCTCACAGCTGCCTGCATGGAGCTCACCTCAGCTTAGTGTGTTCCAGCCGGAGCTC
CAGTTTCTTAGACACCATGTC
+
11:=DDFFHHHFHIJJJJJIJJIJJJJJIJGHHCFCHI<EAGCFC>DBHII@DHGBHII;=8==A;@;@CEH?BDDDBBC
C355:@CCCD@>AC?88@::>
after cutadapt:
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAG
GTC
+
::>
and the $adapters file:
-g AAAAAACTCACAAAGTCAGGTAATTCT -g AAAAAAGCTCTCAGGGTTTTGCC -g
AAAAGACAGGAGGCAGAAGGTGA -g AAAAGCCTTGGTAGGTCCGATTG -g AAACAGGCTTGAGAATCAGGGTA
-g AAACAGTCAAAAAACAAAGAGATGGA -g AAACTCCAGGTCCTCTGGTTGG -g
AAACTCTGTTCCAGATCCCTTCC -g AAAGCACCTCAAGGCCCAAG -g AAAGCAGTGGATCACAGGAACAT -g
AAATAAGAAAATCATAGATACTGCAAAACTATTC -g AAATGCTGCAAATGGGTGTCAAT -g
AAATGTGAGCATCCTGGTGAGTT -g AACCCGGGGCCAAGTTC -g AAGACGTCAAACCACCCTTCC -g
AAGAGCACTGGTTGTTAGCACTT -g AAGCTTTGGCTGAATCTGTTTTAT -g AAGGATTTTGATGCTTTCGTATCC
-g AAGGGGATTAGTGCTTGGTTGTC -g AAGGTACTGCTCAAACACCAAGC -g
AATAAAAACAGAGACCAGCCCAC -g AATAGAGAACCTCTAACATGGTGAATAAGAT -g
AATCAGGCAACTCAGCACACATA -g AATCATTGAGCGTAATATCATCTTGG -g
AATGACCGAGGGGTAGTCATTCT -g AATGCACTCATGTCAAGAATAAGC -g AATGCCAAGTGCAACGGCTA -g
AATGCTTTCAGCCTTATGCCTTG -g AATGGAGAGGGGAAAGCTTCTTG -g AATGTTCTGCTCAGCACCCAA -g
AATTTAAAGGATCTTGAGAAAAACAGGT -g ACAAAGTTTTCCACCTTCTCACA -g
ACAAATTGCCTGAAGGAACACAC -g ACAATAGGTAGCAAACCATACATTCA -g
ACAGCAGAAGGGTGCTAAATCTT -g ACAGGCAGTTCTTTACAAGTCTCA -g ACATAAGCAGGCTTCTAAAATGGC
-g ACATATGATGCTTTTGTGTCTTACCT -g ACATTCTATAGCTTCTACTTGGGCTT -g
ACCAAGTCATTCACATAATTTTTCAGC -g ACCACCACATAGATGAATAAGCA -g
ACCAGAGTCCTGACTAGAAATGG -g ACCCATTGTCTGAATTGTTTTATACCT -g
ACCCCTATGATTAATGTAGCACTGTC -g ACCGAATCGTAGTGGATGAAGTTT -g
ACCTGAAATTCATCCTTGAGATGTG -g ACCTGAGGCAAATCCACGTC -g ACCTGGTATTTACAAAGCTGAAGAA
-g ACGATCAAAGTCACGTCACAGAG -g ACGTCCTCTTCAATGGAAAGATCC -g
ACGTTGATATTGCTGATTAAGTCCCT -g ACTCAATTCAGACAACTAGTATCTAAGG -g
ACTCCATGAAGCATTGGTGGAAA -g ACTGAGTATTCATAAAATTTGACTTCAGC -g
ACTGGAGAACAATAAAAACCCAACG -g ACTGGGAGGGTTTGATAACTTGAC -g
ACTGTACAAACAAAAACAGGAGCA -g ACTGTCAGCATCTCTGTATCGGT -g
ACTGTTTCTTTTTGTGTTTGACAGC -g ACTTACATTAATTCCATTCAAAATCATCTGT -g
ACTTACCTATGGCCTTGTTTAGTAGAAT -g ACTTCACCTGCAAAATGGGCATA -g
ACTTGCCCACTTACTGGATTTCAT -g ACTTGGCAAACAGATGGTGAGAT -g AGAACTGGAATGATGAATGGGACA
-g AGAATCCTGCCAACTTCCACAAT -g AGACATGGCTTTGACATTAGTTCCA -g
AGACCCAAGATGATTAAGTATGATCCA -g AGACTGGCATGAACTTTTCCCTA -g
AGAGGTCCTACATCTTCTGCCAA -g AGATAAAGGGCCTAGAGAGTGGG -g AGATCAACACACCATACACTTCCA
-g AGCAATTAAAGGCCAAAACTAAGGA -g AGCATCTCTTGGTCATCTGTTGG -g
AGCCAGTAAAGGTATTGGAGAGT -g AGCCTGCTGTGGTACTGG -g AGCCTGTTTAACTTTACTGCCAA -g
AGCTAAGATTCTTGATGCCTGGT -g AGCTCCAGTTTCTTAGACACCAT -g AGGAAGGAGACATTTAGGTAACGG
-g AGGAATCTTCCATTCAACAATCTCC -g AGGACCTCTCTGACTCAAGGTTT -g
AGGACCTTCCTAATTCTTTTTAGATTGT -g AGGACGCAGAACAAGGAAAAGTG -g
AGGAGTCGTATTAAAGTCAGGCTA -g AGGCTCTGCTTTGTTTTCTGTTG -g AGGCTGACATTTGTATAAGGTGG
-g AGGGCTGAAAATGGCTATGGAAT -g AGGGCTGGGTGATTGGGAT -g
AGGTATGTATTTCATAAAGTTTTTCTGTGG -g AGGTGAAGCATGCAGAAATAACTA -g
AGGTGACTCTTTCTAGCATGTGA -g AGGTGTTTCAAAAAACAGTAACATCA -g
AGTACAGGGAGTACAGGATACAT -g AGTCATTTCCTGGCAGAAGTCC -g AGTGGAATGGATATGAAAGCATACCT
-g AGTGGTAAGACAACTCAATTTTCCC -g AGTTAAGCCCCAGACAAATGC -g
AGTTACCTTGTCATTTTGGTTTTTGTTT -g AGTTTCAAAAATCAGAACTTTAGTTGCC -g
AGTTTTACAGTGAACTGAATTAGGGT -g ATAAGCTAGAATTCCATTTTCTAATGTGT -g
ATAAGGCAAGTGTGGAAAGGACA -g ATAAGTACATAAAAGCAATCCATAGCC -g ATCCACCGCCAGCTCCTAA
-g ATCCATCGTCTCCTCCTCTTCAT -g ATCCGGTTGATCACATCACAGTC -g ATCCTGGGCAAGCTGGG -g
ATCTCATGTTCCCCCTTTCAGTT -g ATGCATGCCACTTCTCAGTACAT -g ATGCATGCTTTCAGTTGATTCC -g
ATGTCTGTGGTTGATAGCCCAAG -g ATGTGACGTTGGCAACATTGAAC -g ATGTGGGTGCAGGGTAGG -g
ATGTTTGGCTTGCTGTTCCTCA -g ATTACTGTGGTTGAAGGGGAACC -g ATTCTAAGGCCCCTCTTTCAACC -g
ATTGCCATGAGCCTGTGTCC -g ATTTTTATAAGCCAAACCCTCCTTTT -g CAAAAGCCGATGGTGTGGAAG -g
CAAAATGCGCTACCACATGCC -g CAACACCCTCTCTTTCAGCCAT -g CAAGATACGAGTGGAACCTGGAA -g
CAAGCATGAATGGATGGGTGAAG -g CAAGGAAACAGACACACGCAACT -g CAAGTGCCTGTCTGCATTCTACT
-g CACACACAAAGCGGTACACGTAG -g CACACACACACACACTCAAAAGG -g
CACACAGCGACCTGACCTTTAAC -g CACACCCATTTTCCTGCACGATT -g CACAGCCCAACTATGGAAACCAG
-g CACAGCCGCAGCAATGG -g CACAGCGATACAAATCCTGGTCA -g CACCAACCCCCACGTGTTT -g
CACCAGCGTTCCAAGTCAGAT -g CACCCACAGAAACGTGCAGATG -g CACCCCTCCATACAACAAGGTTT -g
CACCGATACACACTGGAAATGTT -g CACCTCTACAAAACCTCCTTCCA -g CACCTGCCCTTCATGGGTAGTAA
-g CACGAAGCGAAGGTCGTTGA -g CACGGCGCAGGCGAAG -g CACGGTTGCTCCTTTCTTTCTCT -g
CACTTCTCTGCCCACAGGTC -g CACTTTCAATGCATGTGGCATTTT -g CAGAGACCGGGGAAGATTTGAAG -g
CAGAGGAAGTTGGGGCTGTC -g CAGCACGTCCACCATCGAG -g CAGCAGTGAATCTTGCCTTGGAT -g
CAGCATCAGAGTTTATTCACTGC -g CAGCCATTCTTATTTCATTTGTGCT -g CAGCCTCCCAGCCACC -g
CAGCGCCTGCCAAAAATATTCAC -g CAGCTAGGGACCCAATATGTGTT -g CAGCTGCACGAGGAAGTGG -g
CAGGATACAGGAAGCACAAGGAG -g CAGGCGTATAACAGCCAATTCAT -g CAGGGAAGACCTCCTCTGGAAAT
-g CAGGGAGTGGCTCTTTGCTG -g CAGTCAGGGCCTCCATTACATC -g CAGTGAGACCACTGCCATGAAG -g
CATACCTGCCCGTTGAACATGAC -g CATACTTGGGCATCTTGGGATGT -g CATCCATCCAGCCATGATCCC -g
CATCCTTCCTGGCAAGTGGTC -g CATCTCCCAATGCCTGTCCTG -g CATGGCCCCTTCTGTCTCC -g
CATGGCCTGTCTTGAGTCTGTG -g CATGTTGGGTCCAATGCAGATG -g CATTGACTGCATGGGTTTCTGTC -g
CATTGCCTTCCAAGACCCAAAC -g CATTTTGTACCCTTGGTGACCCT -g CCAAAACCTTAGGAAGGGTGTTCT
-g CCAAAATGACTGACATAAACCCCAT -g CCAAGGTCTCCAACAAGCTCAAG -g
CCACAGAAATCGACTCTCACCAA -g CCACAGGAGACTCAGGGGAAG -g CCACCATGGAGAACTGGTAGGAG -g
CCACTCACCTCCCCCAACTA -g CCAGAAGGGAACGGCGAAA -g CCAGAAGTCTTTTACAGTGTTTGGC -g
CCAGATCCGATTTTGGAGACCT -g CCAGCACAAGGTTCCCACT -g CCAGCCAACCACAATGACGAG -g
CCAGGAGAGCAAATAAAGTAATGCC -g CCAGGTGAAAGTTATTCCTCCG -g CCATCACAATGCAAGCAGATTGTT
-g CCATCACTGGTCTTGAAGGTTGTTA -g CCATCTCCACTCCAAAGTTAGACA -g
CCATGCTAGGGGCAAAGAGG -g CCATGGACAGCCCCCTG -g CCCAGAATAAGAATGCACCGAGG -g
CCCAGAATCCCATGGAACCTT -g CCCAGGAAACAAGATTTTGGTGA -g CCCCAGGAGTATGCCCTTTTC -g
CCCCAGTCGTTGCTGTTCTTTAG -g CCCCCCATTATCCTAGAGTGAAA -g CCCCCTCAAACTGAAAGCAGTAA
-g CCCGAGTATTTTGCATGTCCAAC -g CCCTAGGCTCACCTAAGGAAGT -g CCCTTCCAGGAGTTTTCTCTCAT
-g CCCTTTCCCCTCTTCATTTATCCC -g CCGCCTGCTGACTTCACTC -g CCGGGGAGGGAGGTGT -g
CCGGGTCTCGGAGGAGG -g CCTATTGCTGCCATTTTCCAATGT -g CCTCAACTCCCCACAGTTTGC -g
CCTCAGTTCCCCTCAGATGC -g CCTCCTGGACCACCTCAGTTT -g CCTCCTTCCTTACCTCATCTGGG -g
CCTCTCAAGGGTAACGAACAGAG -g CCTCTCGGGGAGAAGCC -g CCTGCACATTTTGTTCTGGTGAT -g
CCTGGAAAATTCATAGGCAGTTC -g CCTGGTAGCATTCCATGGCTC -g CCTGGTTCCTGATGTTCTGTGG -g
CCTGGTTGTGTTGGGTTTCATTT -g CCTTAAAGACCACAGCGAGGAG -g CCTTGACCTTGTTTCTGCTAACAT
-g CCTTTCCAGGGTGAATGTCCTAA -g CGAAATGCAATGGGGTTGAGAAT -g
CGAAGGAAACAAACCCAATTCCC -g CGCAGGAGGAGGGTTCTTATAGT -g CGCCGTCGGACATCATGAATAA -g
CGCCTTGTCGATTTTGGAATGTC -g CGCGAGCTGGAGCACTA -g CGGGAACCCTGGAGGGAC -g
CGTGAGTAACTAAGAAGCAAATCTGG -g CTAGAAGCTCACAGACAAGCAGT -g CTATTGAGTCCCCACCACCTTC
-g CTCACACACTCCAGGTGAGAAAA -g CTCACCCTGCTTTATGATGAGGT -g CTCAGCCCGAAGCCAATCTCTA
-g CTCATGCTCCCCACCGTCAT -g CTCCAAACTGGGATCACTAACCA -g CTCCCGCAGCTCAGACG -g
CTCCCTGAGGCTGAGTGAACA -g CTCCTAGGCCCTGAATTTCAACA -g CTCCTCCCCTCTTGCTTTCAG -g
CTCCTTCTTGTGACAAAGGCAAA -g CTCTGCCAGCATATGGAGTTGAT -g CTCTGTCCGTCCCTCTCTTCT -g
CTGACCCCACCTTCATCTTCTTC -g CTGACTGAAGAACGCCTGGATG -g CTGAGACATGTACTGGCTTCACT -g
CTGAGGCACATCTAGGCAGTTTC -g CTGATAGGAGCACCTTTGGCTTG -g CTGATGGCATCCCCTCCAAG -g
CTGCCTGTCTCAAGCTGCAC -g CTGGAGGTGCTTTTATGCCCAA -g CTGGCAAAGGAATGAAGTTATTGGA -g
CTGGCTACAGGTGTGTGTGTATG -g CTGGCTCAATAGATGGACAGGTT -g CTGGCTGGCTTGTGAAGACTAT -g
CTGGTCCGAGAGAGAATCCTGA -g CTGGTTCTGGTGGAACTTGGAG -g CTTACACAGCGCCGTAGCC -g
CTTAGCTGGTGTCTCCCTGCTT -g CTTCACTCTGCTGAAGGCATCTC -g CTTCATAAGAGGGTGGGCTTCC -g
CTTCCTTCTCTGCTACGGGGATG -g CTTGTATCTGGGGTACTGCGTTA -g CTTTGCCCCTACCTCCTTGG -g
CTTTTGGTCTGGGGCAGATTGG -g GAAAACAACCGAGAGCCTGAAT -g GAACAGAGACCTTAAGCAGGAGG -g
GAAGCTGGTCGAGCCCATT -g GAAGGTCTCAGCAGGGTTTCC -g GAATGCAAGCCACAATGAGAGAA -g
GAATGCAGCCCTCATCACATTTC -g GACACACCACTCTCTGGGATTTT -g GACACAGAAGGAGACAGAAAGCA
-g GACACGACTCACTGTCTGCTTC -g GACCGGCAGATGAACACCAA -g GACCTAGCTGCTGTCCTTCC -g
GACCTTGGCCCACAAGTTCTAC -g GACGGTCCTCAGCGGGAG -g GACGTGGTCCCTGGTAAAGTC -g
GAGCCCTATTCCAGTACAAACCC -g GAGCCTTCCTAATCCCTACGAAA -g GAGCGAGGGCAGACTTTGATT -g
GAGCTGCTATTAGTCCCATCTGT -g GAGCTGGGTCTGCAGGAAG -g GAGGACAGGGCAATACCGATG -g
GATACTGCTCCACAGAGACTGC -g GATAGTCCTCTGAGTCGAAGCTG -g GATCCAGTGACAACCTCTCCTTT -g
GATCCTGGGAGCAGTCTAGGG -g GATGAAGCTAGAAGGCAGAAGGG -g GATTCACACACCACCTTTGCATC -g
GCAATATGCGGAAAGCTGTGAAG -g GCACACCTCATCTGAACTTCATTAC -g
GCACTACTCACCTGTTAAGGAAAA -g GCAGCACAACGGATTTTGTGAA -g GCAGGTTATCGTGTGAAGGAGG -g
GCATCATCAAAACAGAACTTGGC -g GCATGACAACTGCTTTGGTCTTC -g GCATGTCCTCTGAGACAAATGAG
-g GCCAGCTCGAGAAAGCAGTC -g GCCAGGTGACTGAAGTCTGTG -g GCCATTCTCAGTTATCCAACAACA -g
GCCCAATGCCTGCACTAC -g GCCTAGTGACTGTGTGTGTCATTT -g GCCTCATCTTTCAGCCTGTAGTTA -g
GCCTCTTCATTTTCAGGAAAGCC -g GCCTTACCTGGAGCCATAACTTG -g GCGCCCAGAACAAATTGTAGTAA
-g GCGTCTTGAGTTGTCCAAGGTC -g GCTCAAGTGGAGTGCAGTTATTT -g GCTGAAGAGAGCTGGAGACAG
-g GCTTACATCTATTAATGCTGACCAATTCT -g GGAAAATCTTAAAGTAATCACATTTTTCTTGT -g
GGAACTGCTCCAGGACAGAGAA -g GGAACTTCCTGAGTTGCCAT -g GGAAGTGGCTCAGTGGAAATGTA -g
GGACTCCAGGCGTTCGT -g GGAGAAATCAATGGTTCTGCCAA -g GGAGAGGAACTCAGGAAGCAG -g
GGCAAATACTCTGCGATCTCACT -g GGCAATGACCGATGACTTTGATG -g GGCACCTCCAGTGGAAATCAAG -g
GGCATCTCGGGTGGTAGATAATG -g GGCCACTATCCCTCTGGGT -g GGCCCTGTGCCTTAGTAGTATTT -g
GGCCTGAACATGGTCTTGGT -g GGCTCCTGCACAGTGACTAC -g GGCTGCTTCTGGGTGGG -g
GGCTGGACTCCTGGCAAAG -g GGCTGGGTGGGCTTTCTTC -g GGCTTCTGATGGCTTTTCTGC -g
GGCTTGTGGTGCTGAGTGA -g GGGCAACACGACCCTCAAC -g GGGGAAATGTTTTTAGCCCATCT -g
GGGGCTGGATCTTTTCCCAC -g GGGGTGACGACTTCTTGTTTGAT -g GGGTCCTGACAACACCTCTTTTA -g
GGGTGCCAGGTGCATTATCAA -g GGTAAATATCTTTTGGCCCTTGCC -g GGTACGTGGTTCCAGGGTG -g
GGTCAGGCCATAGAGGCAGA -g GGTCCCGCACATAGTCCTTGA -g GGTGAAAGACTGATGTCTGCTG -g
GGTGAAGCAGTCGCCCAT -g GGTGCAAGAGGCCAGCAT -g GGTGGGAGGCACACTGG -g
GGTTAGCGAGCCTCACTTTAGTT -g GGTTGAGAAACGGACACAGCTA -g GGTTTGTCACCCGGCTCTC -g
GTACCCCTCTCAGCCCCTC -g GTACTTACAGGCACCGTTTTCCT -g GTAGGAATAGGAAGAGCAGGAGC -g
GTCAGCGGAGAGGTCCC -g GTCAGGCTCATCTCCAGACAG -g GTCATAGGCCGTTGGGAACTC -g
GTCATCGGCGCTCAGAATAGG -g GTCCAGGTTCCAGAACACCATTC -g GTCCCGGACATTCTCAAGTACA -g
GTCTCCTCCCAGAGGTACAATTC -g GTGAAGTTGGACACCTCCTTCTG -g GTGACGAAGGCCGGAGC -g
GTGAGCAAGTCCCACCTACAG -g GTGATTCTCCAGGATGTGCCATT -g GTGCCAGGATTCAGTCCTTTCTG -g
GTGCTCCAGGGAGTCATCA -g GTGCTCCTGTACATCCTGCTTG -g GTGGCGCTGTAGGGGAAG -g
GTGGCGGGAGGTAGGTAT -g GTGGCTGCAGTTGCTCACTATTT -g GTGGTTTCCTTCAGGTGCTTCTA -g
GTGTGGCAGAGCATGATGAACAA -g GTTATTTAAAATGCCTGAGGGCCAA -g GTTCCCCTATGACTCTGTCCCTT
-g GTTGACCAGCAGGGACACG -g GTTGTAACCCAAGGATTCCCAGT -g
GTTTCCAGAAAAATTCTATAGGAGAAAACA -g TAAATGCTCCAGTTGTAGCTGTGT -g
TACAGGACTCCAGAAGGCAAATG -g TACAGGCCAGACCTGAATTTCCC -g TACCCTTCACTTTTTGACCCCAG
-g TACTCACAGGCAGTGGATAATCGT -g TAGAGGAAAGTCCTGTACATCGG -g
TAGCCAACATCATCTTCAGAGGC -g TAGCGAGGTACGTCTAGAAGGC -g TATGCAAAGTCAGGGTGGATGTT -g
TATGCCATGAGTGCTCAGAGAGG -g TCAAGGTGATAACAAGATCACAAGG -g
TCAATATAGTTGGCATTAATATAATCTGAGGA -g TCAATCCAAAGGGGAAGAATGTGT -g
TCACAGAGTTCTGAAGTGGAAGG -g TCACATCTGATTCCCTATGTGTG -g TCACATGGATAGAGCATCAGACA
-g TCACCAAGAATTGAATGGGAACT -g TCACGAAGGTTTCGTTTTCTCCT -g
TCACTGAGATAGGGGTCTTTTCCT -g TCACTGTCACCGCAGATAATTGA -g
TCAGAAATGGAAATTAAATGTCATCAGAA -g TCAGCCAGTTTTCTGTAAGTCAAT -g TCAGGGCCGTGGGTGA
-g TCAGTAAATGGAACTGATAAAATCTGAC -g TCAGTCTAGCCACAGACCTGAAC -g
TCAGTGACATTCTACCTCCAGAA -g TCAGTGGAGGGACGTGGT -g TCAGTTGGAGACATTCTGAAGCA -g
TCATAAAGAAGCTCAAGGCAACAC -g TCATATTCAAGCGGCCCACAG -g
TCATGCTAATGTCTTTCTTATTGGATTT -g TCATTTTCAGACATTTGCGTGGT -g
TCCACCTCTCTCTGTTTCCACAT -g TCCACTTGGAATACAAAGAAATGACT -g
TCCAGAATCACAACTTTGTTGCC -g TCCAGGATGTTACCAGGACATTT -g TCCATTGGGCCAACAGATGC -g
TCCCCCTCTTGCTTTATCTCCTC -g TCCCTCTGGATTGGTTAGGATCA -g TCCTCCCAGATGTAGGACAAAGT
-g TCCTCTCTCCCCGCAGTC -g TCCTCTTTGAACCCATAGCTGTC -g TCCTGCCTCCTAAAACAAACC -g
TCCTTCCTGCAGCCTTGTTG -g TCCTTTTGGTCTTCAGGTTGGAT -g TCCTTTTTCCCTTTCTTTTCTAGCAG
-g TCGTGTAGAATTCAAAGTGAAATGAGT -g TCTATACTCATTACTTACGACATTCACT -g
TCTCAGTTATGTGTACTCTGCTCT -g TCTCCCTAAGAAATGTGAGCAATAGT -g
TCTCCCTTTATACATTTTGCTCCT -g TCTCGATGAGAAGCCCATCCT -g TCTCTGACCCTTTTCCCAAGAAT -g
TCTGCTTTGCTGAATGAAGAGGA -g TCTGGGAAATTCTCAAAACCCCAT -g TCTGTCCATTGCACCTTTGTCAT
-g TCTTCCTTTTAGTCCTTCAATCCAC -g TGAAAATGAGAAAATAAGCACCAAGG -g
TGAAGCAGGGAACAAAGTCCTTC -g TGAATAGTTGTGCCCACAGTCTTA -g TGACAACTAGGAGAAGGAGGATGA
-g TGACCTCTCGGGCATCCTG -g TGAGACCTGAAGAGTCAGAAAGAT -g TGAGAGACATCTACTCTTTGTTTGC
-g TGAGCGGCCCTCCCAT -g TGAGGCTGTTGCTAGAGTTATCTT -g TGAGTGTCTTACATGTTGCTTCT -g
TGATAACAGCTGAAGAAACCAGT -g TGATGCTTTGACAAAAGGTAATCCA -g TGATGGACAACACTGGTAAGATT
-g TGATTTTCAGGTTGTTCCACAAAAA -g TGCAGCCAATGTTGAAGCAAATC -g
TGCATCCTGAGAATGAGTGTGTC -g TGCATGTGCTTATCAAACTTCAAA -g TGCATTATGTAGCTTTCCACAACA
-g TGCATTTTCAGCTGTTGTTGATGA -g TGCCATCAGAGGCTAGAATTATGA -g
TGCTAAATTCCACTGAGTATCATAGTTAAAA -g TGCTAACTGCAAAAGAGACTGAC -g
TGCTAAGAAGTTTGGAATCAGGT -g TGCTACTTTCCATTCTTTTTTCCTTT -g
TGCTCCAATATGAGGTTATTTTCCA -g TGCTCTACTTCCTGAAGACCTGA -g TGCTGGTCTGTAGGAGATGGTAT
-g TGCTGTGGAAGTCTGTGTTGTA -g TGCTTAAGAGAAAAAGAAAGAAGAAAACCA -g
TGCTTTTGTTCTAATAAAGCCATGC -g TGGAACCCTAGCCCATCGTC -g TGGAGGTGTTCTTGTTTTCCACA -g
TGGCAATAAAACAAATCGAATAGCA -g TGGCTGAGTGACACTGAAGTTTT -g TGGCTTTGGAGCTAGTCTGAAAT
-g TGGGAATGTCAGGATATAGGTACT -g TGGTCCATGAAGGTGATGGAAAG -g
TGGTGCATAGTTATATGGGGACA -g TGGTTACTAAAGGGAAGATGGGTG -g
TGTAAAATCTTGTGTGTGATTTTGTGT -g TGTAATTCAAAACGAAATTAGAAAACTTACTTG -g
TGTACTGTATAATACGACTTCACATCTT -g TGTAGCCTGTTGTGTCAAGATGATT -g
TGTAGTTACTGCTCTCCCCATCA -g TGTATCAACCATTTTTAGTATTGCCTT -g
TGTATCACCTTCTGCCAAGTTCC -g TGTATTACACATATAGTGAGAACTGATGAC -g
TGTCACTTTTGCTTTGTTTTCTTTCT -g TGTCAGCATCCCAACCAGAG -g
TGTGAAAACAGTTACTATCAAACACTGG -g TGTGCTACCCTCACTCTTGGA -g
TGTGGCTGTGCACCTAAATCTTC -g TGTGTCTTATGCTAGATTCCTTCAC -g TGTTGTGTATCCTCAAAGGGGAA
-g TGTTTCCTCTACTGAGGGGTGTT -g TTAAAATCAATTTAGTCAGGATCCCAAA -g
TTAAACTGCATGTGCCTTGCTTC -g TTACACTTTTATTCCATAGAAATTATACTCAGAAA -g
TTACATTAGTCCTGGTCACTTCAG -g TTATTTCTTTCACATTTTTCTCTGATGTTC -g
TTCAGTCGTCTCTGACTGATGC -g TTCCCTCTAGAATGAGAGTTGCC -g TTCCTGCAGTGTCAAAACTTCCT -g
TTCCTTGTTCTCCCAATACCTCA -g TTCTCTCCGAGTGCTTTCCAAAT -g TTCTGAAAACCTTTGCTGGGTCT
-g TTCTGATGAGAGGTGTGGAAACA -g TTGAGCCCCAACTATGTCAATGG -g
TTGGCATATTGGAATTGGTCAGC -g TTGGGGAATACTCGTTTTTCAGC -g TTGGTCTTGGGAGTTTTGTGGAA
-g TTGTAGTCTCCCTGTTGTCTGAA -g TTGTGCTCAAAAGCCTTGTCTTG -g
TTTAATTATCTCCCCACCCACCC -g TTTCAGGGAAACACCTTTGTCCA -g TTTCCCAAACCCCATAATGCTCA
-g TTTCTCTCAACACCGCATACACA -g TTTGTGGAAGGTGGTTTCCTCTT -g
TTTTCCAAGTGAGGCCACTTCAT -g AATGACTACCCCTCGGTCATTCT -g CCTCCCAGTGGTTTCTGTTCTAC
-g CAGGTGGACAGCCTCTTTCAG -g TCAGATTGTTGTGGAAGAACTGC -g TGCCTTTGTCACAAGAAGGAGAA
-g GTGAAGAAACCTCTTCCCTTCCC -g TCACTTGGCAGGAAATTGGGAAA -g
CTGCCTCTCCACTCTGAGGTTAT -g GGCATTGCCAAAATGATCACAGG -g AAAGGACAAGAGGGCTGTGAAAT
-g CTCCTAGGTGGCTGTACTTTTGA -g TGGAGTTCTGGACAAAGCAAACA -g
ACACAGCTACAACTGGAGCATTTA -g CGAGAGCCGGGTGACAAAC -g CTCTCAACTCTCTGCCCCTTCT -g
GTGGGGAAAGACTCGATAGGTTT -g ACTTTGACATTCTTCCTCAATTACTCA -g
CATGTTGGACTTACCTCCTCCTG -g ATGAGCTGCGCGTCCTC -g CCATGTCTGGACCCTTCTCTC -g
CGGAAGGTGGAGCGTCAT -g TTTTTCTCTCCTAGGTAAAACCGATAACC -g CTCACACCCTCCTGTCCTTGAT
-g TTAGGAAAACTCCTCCCCTCTCA -g TCCTCTCTTTACTTGCAAACCCT -g
ATTGACACTCTCTGCTGTTTGTC -g GAGCTTATCTTTCCTCTCCTGCT -g AGGCAAGTAAATTCCACCCACTT
-g CGTGGTCCCGGAAATAAGAGAAT -g TCAGAGAGGAATCAACTGAAAGCAT -g
GCAGACAGACAGAGTGGTGAAAA -g TGCCCCTCTCACATTCATTCATT -g ACATGATGCGGAGATATGAAAGC
-g ATATGCTGATGAGACAGCAACCA -g TGAATGAAGAATGGCTGGAAGGG -g TCTCCGAGTCCGTGCAGAG -g
AGCTGAGTCATATAGAAAAATATTGGTG -g TGAATATTTTTCTATCTGGCATTCCCT -g
ACTCTTTTCTTTCGGAATGCCTCT -g GACCAAGGTCTAGCTCTACTGTT -g ATCTTAACAGGTGGTTGTCTTCA
-g TTCCATTGAAGAGGACGTGTTGG -g CTTTGGTTGCTGTTGGAGATTGG -g
TGAGAATGAACGAAAAAGAAAAAGGTG -g GGAAAAACTTACTAAGCCAAAAGAAGA -g
AAATTTTCTTTTTCATTCTTTTGGTAGGAA -g CGGCCGCTAGTGCTCC -g TGAAAACCAGTAGTTTCCTCAAAGA
-g GTGGTGACTCAGCCTTTCTTTCT -g TGGTGAATGTTCTGGCCTGTG -g
ATTCTGACCTCTATGTAAACTGAGCTG -g CCATGGATGATACAAGAGAGGATATT -g
TGACTGTTTTTGCAGGCATTAAGT -g CCCCGTTCACTGTGGGT -g TACGTCTGTTTCTTCCCAATTCT -g
TTGTGTTTAATGGCTTTATTTTGGC -g CTGCATTGTGCTGGGGATCTG -g AACACTTCTGGCAGTATCCTTCA
-g AGAATTGCCTCAGCTCTTGGAAA -g CACAATGTACAGGATGCAACTCC -g
GGACGGGGATTTCTATCACCATTA -g GACTTCGGGGTGAACTTGTCTC -g AAGGCTGCACATCCTATGGGTAA
-g CAGCAAGAACTCTGAGGTGAACA -g GATCCGTGATCTGCAGGCAT -g ACAACCTCTCTGAAACCTGGACA
-g TCTTCTTGGTTTTACAGGAATGTTTCA -g CGGAATACTCACAGGATGCATTT -g
CAGGTGAAGACACTGGTGGAAT -g ATTGTCGTACGCTACAAGCATGG -g AGATGGAATGTACAACCAAGTGC -g
GGCTGTTGCTAAAGGATAAAATGCC -g CCTAACACATGCTGCCATTTCTG -g
AGTGACTTCTTAGTTTTGGGTACTTGA -g GAAGCCACAAGTTTCTGTGAGCA -g
GCAGATGGAATCATCTAGGAAGGT -g TTGTGGAAACTTTGTTGCTGCTC -g GCTAACTGGTTTCTTCAGCTGTT
-g TGTTGAGAGAAAGCCTCAGACAC -g AGAGTGCTGCGGTCCA -g CATAAAGCAGGGTGAGCAAATCA -g
CAGTGTTTGATCATCCAAGGCAA -g ACATTTTACAAAGTTGTTTTCTGTTGTG -g
TGATTTTGCAACCCGTAGAGGAT -g TGATGAATGATCCCATTAGTAGGTTT -g
TGGCCAGTTGATACCACTAAAGG -g GCTGAAATACAGGCAAGACGGAT -g TGGTGTTTCCATCATCAAGGCTA
-g ATGGTCTAGAAGCACACTGTTGG -g CTGTCCACGAACTCACAGTATCC -g
CATTGTTGTGCTTTCTTTCACACC -g AGTGGAAAAACTATCTTTGATTTTCATTTT -g
GGGATAAGAGTCCTTTTCCATAAGAGT -g TTCTGAGGATGTTGCAAAGGACA -g CCTGCTGCTTGGAGCC -g
CTGTGCGCAGTATTTATCCCTTG -g CGTGTTTCCAAAGAGAAAAACCT -g
TGAAGAAGATACCATAAAAACAATGCTC -g TCCACTGCTTTTGTACCAGCTTT -g
ACACATACATACATAATTCTCATTTCTTTCT -g GTGGATAGCATCCTGTTCCTCTG -g
TCAGGGACTTGTTTTGGTGAGAG -g CCCTCCACTGTAGAAATTCCTTGAAA -g TCTTCACAGAGCTACAGGCAAC
-g CCGTGGCGCCTCATCG -g GAGGTCATCCGCTGTGCTG -g AGTAATGAATTGCTAAAATCTGCTTTATACT
-g CGCGGGTCCTGGAAGGTA -g ATGCTTCATGGAGTCCACTTTCA -g GAGTATCATAGCAAAGCCCCGTT -g
GCCTGACCTAGTCATTCTTGAGT -g TTTCATCTGTGTGTGGCAGGG -g GCATCCTCTGTCCTGGGTAGT -g
CTCAGGATTCTCTCTCGGACCA -g GGTCACTTGGTGCTGCTATCTAA -g CAACTGGCCAAAACTCCAAAACA -g
ATGGCAGGCAATGACGAGAACTA -g ACCATGTACACCAGAAACACCAG -g GTCATCTTCTTGGGGACACTGG -g
GTCAACTCCAGCCGCTACAAG -g CTGTCCAGCAACATCTCTATCCC -g GCTGTGCCTTGCAGTTTGTTAAAT -g
AACAATAACTGGAAAAGCCCCAG -g CATGGGAACATTCCTGGGTCTG -g AGTCCATCAGATGCTACCAGAGG -g
GTCGTCTAGAAGTGAATGAGTGGT -g GGCCTGTCTTGTAGATTTTTGCC -g CGTCTAGGGGATGTAGAGGTGAA
-g CCTCACACCCACCCCTTCT -g CCCAGCCGAGAAATTAAAAAGCA -g CTCACCTTGTAATACTGCCCCAT -g
CTGCGAGCAGACGGAGAG -g CAGATATTGCTTGCTAAGATGCTTT -g CCATACTGGCACATTCCTTCTGC -g
AGTGTGCAAAAGCCCTTGTTTTC -g GACAGCTTCGACTCAGAGGACTA -g CTCCGGCCTGACCCTCTAA -g
CCAGGCAAAAGAAACACAACAGA -g GTTTCATTCCCCTCATGACAAGC -g
TGGATCATACTTAATCATCTTGGGTCT -g GGTCACCCTCTTCCTTTCTAAGC -g CCCCAGGTGAAAGGACGTG
-g TGATAATTTACTTGTGACCTGTGTCATT -g ATACATGCTCTGCTGGATGTAGG -g ATCCCCGAGGGCGTCAT
-g TGACACAGGAGAGAGCTGAACAT -g CGCTTCACACCTCCCTCC -g GTTATGCGTATTCCCGTAGACCC -g
CCCAGAACCTAGAATTGGGAGGA -g AGTGAGAGAATCTGGCTCCTTGA -g TTTAGCTGAACCCAGAAACATTCA
-g TCCTTAGGCAGACTTGAGTTTCT -g CCCCCATGTATGTGTATGAGTCC -g
AGGCTGAAATTGCTTTTCACATTCT -g GGCACTTCAGAGACACCTTGAT -g CAACGCCTTCTCTCACACAGG -g
TAGGTCTGCTGGGGCATCC -g CACAGCCCGATAGGAGATACTCA -g TGGAAGCCCACCCTCTTATGAA -g
CAACTTACATCCAGCGAAGGCTC -g GCAGATCCCATTAAATGTCCTGGT -g GGGCTGGATCTTGTGTCTATCCT
-g CTCATTAATCACTCATCAGCTCCA -g CCACTAGACTGCTGTATTTTCTCCC -g
GAGCGTGATCTTCTCCTCCAG -g CAAGAGTGTCTGCCCTCGG -g GGAAGGGTGGTTTGACGTCTT -g
CTCTGGGATGTGTTGGTTACTCC -g AGCTTGCTACACTGAAGACCATT -g TTGAAGTGCACACTCATCTCTCC
-g CCTTCGCGACACACCAGAT -g AACCAAGCAAGTCACAGAACAAT -g TCCAACTGAGAAGGCAATTGGAG -g
TCTGTGTGCACTTCCTTTCACTT -g CAGGAAGCCGTACCCTACAC -g CTGGGGTCAAAAAGTGAAGGGTA -g
AAGGGAATGCTACGTTTTCTGTG -g GGGCACGAGCATCAGCAG -g CAGAGTGATCCCGGTGGAGTTA -g
CGTACGTCCCCCACTCCT -g AGCTATTGCATCTTACTGGTCCT -g GGATTTTAACTGTATGTTTCTTATCTCTCT
-g CCTACACACTCAAGGTGCACTAC -g AGGGGACCTCCTTCATCCC -g TAGGCCACAGAACTGAAACATCT -g
ACAAACTGTAGGTCTTTGTAAATGC -g TCAGTGACATTTTGTGTTAGACCT -g CCGCCGGCAACCTCAT -g
TCCATTCATTCTCAATCCTTCTGC -g TAGCTTTAGCTTTGCTGGGTTTC -g GGCTTGTGGGAGGTCTTGTT -g
ACCCCTTTCACAGGGAATTATGG -g GCCATGCCAGGGAGCC -g ATACAACACAGACTTCCACAGCA -g
CATTGAAGGCACACTGGGTTGA -g CTTGCCTTCTTACAGCCCATTTG -g ATCATAGCTGAAATTTGTGAGCAG
-g CCGTGAATGAGGGTATTGTCTCC -g TGCATCAGATTTTTCTGGGGCTT -g
TGAAAAGATGACAGAACAAGATACCA -g AATGGCTTGTGTTTCTAGGCAGT -g CGCAAACTGTGGGGAGTTGAG
-g CTGGACCAGGATTAGTGGGAAC -g GCTTCTGCTTTCCCGCTTCT -g GCTCTCCCCGGATCGT -g
GCTGTGAGATCCCAATCACCC -g CCACTGGAAGGAGGAAGAGGG -g CTGCTGCTGCTGCCACT -g
GGAGGATCAGAGGCCCCTTC -g GCTGCGTAGAGTGCTGAATATCT -g TCTCACCTTTTCTCACCAAGATGAA -g
AACGTTCACAAAATCCGTTGTGC -g TCCTGAAAGAAGTAGAGTACCGA -g
AAAAGCTGTTTTAACTTTTCTTTCCTTT -g TGGTGTGTTAATAGTCAACCCCTTA -g
AATCAAAGTCTGCCCTCGCTCA -g CCCCACTGTGATGTAATGGAGG -g AACCCTTTCCTGTTCCTATTGCT -g
AGCTGATGCTAATAACCAGACAA -g TGCTGGAATTAAATCACCTTGGC -g GGACAAGAAGGACTGGAAGGGA -g
TCCCCAAACCTCAAGTTCATTCT -g CAGGCCAGGGCACCCA -g TGTTCCGCCTGCCAGAATTAAAA -g
GTATACGTGTGAGCCTTTCCCTG -g GCGACGCGATGAGGGTATC -g TATCCAGATCCCAAGGCATCAAC -g
TTTTAAAGAAGACTGTTACTTTTTCATTGATT -g TTACTTGGTCTGGTGCTGACATT -g
CCAGGATCACCAACACTTTCTCT -g CCCACTATTAACTCCAAGCAGCA -g
TTGTAAGAACTCTCTTTTTAATTCACGTA -g GCCTACAATGAAGGAGAACGACA -g
GTGTGAAACCCACAACCAAACAG -g GGAAGCACGAGGAGGGTATC -g CGCTCGATGGTGGACGTG -g
GTGGAGCAAACAATGACGATGAC -g AAATGACAACGGTTTGGAGGGAC -g TTATAGTCAGTTCCCCTCCCCAC
-g GTAGACTAGGGCATCCTTGCTTC -g ACCTCACTAGCTACCCACTGTAA -g CATCTCCCAGGGAAGCAACG
-g CTGCTTTCCCTCATTGCCTTTTC -g CTGACCCGGGAACCTTCTCA -g TGAAAGCATGAAGAAGTATGCAGAG
-g CCTCCCGTCTGTGGTTGG -g TGTCTGTTCATTCTGACCTGTGA -g CTCCTTCTCCCTTCCTCCCTG -g
GCTGTGTTAGGAAGTCAACCACT -g GGGACTCATGGTCTTGAACCG -g TACACTTGACCACAGACTCACAG -g
AGTCAGCAAATACAGGAAACAATGA -g TTGGTTGTCAAGTTTCATTTGGT -g
ACCCCAAAAGGATTTTATCTTGTTGTAT -g GATGGCACCAATAACCCTTTTGC -g
TGTCCTCTGTCCTTACCATCCTT -g GCCAATCTCTCACTCACCTTTGG -g CATTGAGGTGGGCAATGTAGGTG
-g GCAAAACCAAACCACAAGACAGA -g GCAAGACTTTACCAGGGACCAC -g CTGTTTGTGTGCCGGAAGGATG
-g GATGGATTCAGAGACGGCCAAG -g CTGCTCACCTTGCTCTAATCCAG -g CTTTGGGGACCTGGAACGG -g
CCTTTGCTTTTTCTTAATGTTTACTTGAAT -g GCCCAGTGAAGAGAATGAGGAAG -g
ACTCTCTTCAGTAGTCCCCTTTG -g GTGGCCTACGCTACTATTACGAC -g GGTTGAGCGACAGCCATGTAT -g
ATGAACTGTGGTTTCCAGTCCAA -g GCTTTTTTGAAGGACAGTCAGCA -g CTGCGCGCTGGACACAT -g
CTCCCTCTCTGTGAGTTCTCCTC -g CCCCTGGCCCACCACA -g TCTATTCCCCACTCTCCCCAG -g
CCAGCCCATTCATCTGCTTCTTA -g CTGCCTCCTACCCCCTCTTC -g GCTTCTCCTTCCAGTCCCAAAAG -g
CTGGCTCGATCCGTGTAGTTTTC -g GGCTGTTCATACTGAGGATGGTG -g GGTGCATTCTTATTCTGGGGTGA
-g AGTAATCTAGCTGGAGATCATTTCTTAAT -g AATGTTGGCGATTTCCCTGTTTG -g
TTCTCTACTGTCCAGGTGTGTGA -g GAGACAAAAGAAGAGCTCCAGGT -g GTCCTCTTCCTACCTCCTCCTTT
-g CCCTCGGAGTGTGCCTTG -g GGAGGCGCTTCTTGAGGAGT -g CCGGACCTCAGTGGCTTT -g
TGACCTTGTTATCTCTTTAAGCCGAA -g ATGTTTCTGTCGATGAGGGCTTT -g ATGCAGAGTGTGGAGGGATG
-g TCCTGCCTTACTTTTTACTGCCTT -g GGAGGCTGGTGCTGAGG -g CCAGGCTGCCCAAGCG -g
TGGCCTGATGATACTGAAGTTTATGG -g TCCGCTGTCTGTCTTGTAACATC -g
TGCCAAGAAGGAGGAATGGAAAA -g CCTGCTGGATCTTCATCTGATCTC -g GGTAGTATGGCGGTGGGTACAT
-g AGTTTGGAAGATGAAAGCAATGTC -g ACTAACTTCTCCCTTGATTTTCACTT -g
ACCTCTCCCACAGATTAATTCTACA -g TGCCTTTGATCATTCACCAAACAA -g
GCAGAGAACAGAATCCGTGAGTT -g TCAGCTGGAGAAAAGTAACATGG -g TTTTAGAAGCCTGCTTATGTGGA
-g CCTCTTTGCCCCTAGCATGG -g ACTGAACATGAGAAAATACCGAATGG -g AGCCCTGTGTTTGCAACTGG
-g TTTCTCTCTTCCTTAGGGCTGTC -g GCCCCTCAGCATGTTGTGAA -g GACCAGGGTGGAAATGGGAG -g
GGGCTAAACAACAGAGATTTTCCTC -g ACTGGATTCTTACTCATTATCCCCAC -g
CAAAGCTGCTTGAAACTTCTCCC -g GGACACTGTGTTTCCACTTCCTT -g TGCCTGGGAACTTTGAATGA -g
AGGAGGAGGATCAAAAATTGATAGGT -g TACAGTACCAATAGGCCAACAGC -g
TAGAAGACTGCAATGCAACAGGT -g ACATTTCCTTCTCCAGAACTTTCGT -g CGGCCATGTTGTTGGGCAT -g
AGCATGAAAAAGGTGCTCTCACT -g TCTGATCCTTGCTCTGGTAGGTC -g GAGATGGCGGTCATTTCCTATCC
-g GTGGGCACCCCTGCTG -g GGACCGGAAACAGCCCAG -g CTGGTCCGGAAGCTTTCTAATGG -g
TTTAGAGGCAACAACTCCCTGTG -g CAGAGTGAGGCATTTCTGTCTCC -g AGCCAGAGTCTAGGTGGCTT -g
AATAACTGCCAGCATGTTTTGAT -g GTATCTTCCAGTTGGGTGTCC -g GAGTTCTCAGAGCCCAGCTTCAT -g
TCCTTTCTGATTCTGTGGTGCTG -g TGCATGCTCTGGGAGGAAAA -g CCTACTGTGTGAGTCCCTCCTAA -g
ACAGAGTTTATTTACAGTTTCTTATGATGGA -g AGAGACTGTTTGGGGGTATTTACT -g
GCCAGGACTCTTTTGAAAGCATA -g CCTGCAGTGCAAGGCAAAAC -g CCCGTTTTTGAGTAAACCTGAAGC -g
CCGCTTCCACCATACCTACCT -g CTGGTCCCACCTCTGATGTCC -g GTGCATCTATGTTCGTCTGTGTG -g
GTGTGGGTGGCTGCACTT -g CAGCAGGCAGCCCCAG -g CACCGCTTACCTCCCTTCTG -g
AGCTCCTCAGCAGCCTCAA -g CAAAGGGTCTGTGGGTGGG -g CCTGTAAGTACGGGGACAAGTG -g
GCTTGCATGTACCAAGAATGTCTAT -g TAGAAGCACCTGAAGGAAACCAC -g GCTGCCACTTCATCCACAAC -g
GAGGGAAAGCTCCTCCAAGTG -g TCCTTGCCGTTTGTGGAAGAAAT -g CCCACGGCCTCCAGTATTAAC -g
TGCTTAAGGTCTCTGTTCAGCAT -g GTGACAGAGCTCCAAGAGGTG -g GTGGAACACATGGTTGCTGAC -g
TGGAACAGTACACACCATGTCTT -g TGTTCCCTTTGCTTATATATCTCTTTAGC -g
CCTGACTCCAGTGTCCCCAT -g ATTCCCAGAGGAAAAGTCAGAGG -g TTGTCTCGAGCTGGAACTGAC -g
CCAGTTTGACCGCTCCTTCC -g ACATTGTTTCGGAACCTGGAAAG -g TCGAAAAGATATAATCCGCATCCT -g
CGACAGAGTGGGTTTCATGGAG -g CCTGCCACATTGAGTCAACTAC -g GTGAAGAGGGCTGGTATGGC -g
GCTGTCCGGCCCACTC -g TGACTTCCACATTTTCTAGATTTGC -g CAAGAAAATGACCGCAACAGGTA -g
AACAGTATAAATTGCATCTCTTGTTCA -g TCTCAGTAGCAATGAGCTAAGTTT -g
GATGCCTGCTAGCCATAATCAGT -g TCATCTTGACACAACAGGCTACA -g CCACAGGAAGGTTATCTGATGGT
-g TTGCATTTGTTTTTCATTTCTCTTCCC -g CTTAACACAGGTGGAGTGGTTCA -g
TTCTTTCCCAGAATTCCTCTTGTT -g ATTTGCCTCTTTCAGCTCTTCCA -g GGTGGGGCCGAATGAGG -g
AATCCAATGCATGTAGCTGTGGT -g TGGATAGACTATGGGCAACCACT -g
GTTGTCCAGTTTAAAGATAATATTTTGGTG -g AGTTCTTTTTGGCGTTCACTGTT -g
TGTTGAAATGGTGAAACGGCAAA -g TCTCAGTGGAATCAAGTCCAACA -g TGGGGCTTCAGAACAGATAAACC
-g GCAACTTGGTCCATGGCAGA -g ATTGACCTCTTTGTTCCTTCCCT -g TTCTCGTTGCCTTGTTTTCTTGG
-g GCATTACCTTATACAATGATGTGCT -g TCTGTGACGTGACTTTGATCGT -g
CATCCTCTGGCTCCTACCAGTT -g GTGACAAGAATAGTGCCATGGTG -g GCTGTGTCCACCGTGAAGTTAAT -g
CCCAGTTTTCAGGACTGCATTTG -g CAACCAAGTGAATCCCAACCCAA -g CCATCTGCATTGGACCCAACAT -g
ACGCTTGGAAGGCTCTATTATGTC -g CTGAGCTCTGTAAACAGTTCCGT -g
TCAATTAATATAAAAGGAGGGTTTGGCTT -g TGACAGTTGTGGTTTATCATTCTCT -g
GGTGATGTGGGCTGTGAATGAAT -g TCTTTACCTATCCAGATTTGCTTCT -g
AGCCATGTGTACTTTTGATGAGGA -g TTTCCAGCTCTGTGGCAAGAAT -g
GGAAAGAAGAAAATATGTACTAGGGCTATG -g GTCTCTGGGCAGTACTCACG -g
TTGGGTTACAACATTACGCGTTG -g ACTGAGACTTCCCACTCTAGGTC -g GACTGCTTCTCAGTCCCAAACTC
-g AAAAATGCCAGATGAGAACCACA -g TGCTGTGGAGAATTTAAGAGGG -g CACCCATGGCACCCACC -g
TACATCCATGAGAGAAAGCTGGG -g ACCTGTTAATGTTTCATCTCTTCTCGT -g
GACTCACATGGTGAATGCAATGG -g CAACACCCTGGGCAAAATCTC -g TTCCCCAACATTGTGCCTTTTTG -g
GTTGGGATGCTGACACTCCAT -g TGATTCTGTTTCCTGGGTTCCAAT -g AGACTCAATCCAGTGTAGATGCC -g
GTCCTTAGGCACAGGGAATTCAG -g ACATCAGCCACCCAGTGTTTTTA -g ATTGAAGCATCTCATTGTCCAGT
-g ATCAAGCTGGCCTGGATTCAAAA -g TCATGTGTTTTGTAGTGCCTGGT -g
TCAGTACCTAGGATGGGCTTCTC -g ACAGGAAACATTATCTGTACATTGACT -g GGCATAGGGCTGGTAATGCTT
-g TTTTCTGTGATTATTAGCTTCTTTCAGT -g ATGTTCAACTGCTGCTTGACTTC -g
AGGCAACTCCCATTCTAGAGGAA -g ACTGGCTGCTACATTGTGATTGA -g CAGGTGTCCCAGTTCCCAC -g
GAGAAGGAAGAAGCCCTGCTG -g AATTTCTTTGTAGGTCCCGTTCA -g GCATCAGGAGAGTATCTCACAGC -g
AGCTTGAAAATAAAGGCAACAGG -g ACTCTATAAATGACACACACAGTCA -g CCATTGGACATCAGATAGGTCGT
-g CGCTCCAACCTGTCTTCTCTC -g GGAAGGGATATTCAGGAGAGCAG -g
TGACCAAGAGATGCTGTTTAAAGAAA -g CTGCCAAGGAACCATGACAAGAA -g
GGGGCAAGAATGTGAAGTCAGTA -g CAAGGTCAAGGCAGTAGGAGAGG -g GGAATCCCCTTACCAGACAGGAC
-g GTACTGGAATATCATATCTTTATATCCTTTATTGA -g GTTTTCTCACCAGGCCCACT -g
AATACCCCTGATCTTCAAACTCG -g GACATCCTGCGAGACTACAAAGT -g CTCCTCCATCTTCATGCTCCAAA
-g AGTTTTACCTGGAGGAGGTGATG -g TCATTGTCTAGGTAAGGAGGAGGA -g
TTACCTTTTGGACATGGCTTGA -g ATCATAGAAGGTTTGCCGCTTCC -g CAGCAAAACTACACTTCAAATGTTCA
-g TGTGTCTTCGGGATGCTTGATTT -g TAACTATGCTTTTTTCCCCCCAA -g
TCATCCATGAGACAGACCTGTTA -g TCCAGTTCCTCCAGTTAAATGCT -g CACAAGGAATGTGTACAGGAACC
-g TGCAAGCCTCAAACACTAAGGAT -g TCAAAACCAGAGAGATTTCAAGACA -g
CTGATGACATTTAATTTCCATTTCTGAGT -g AGGTTCCATGGGATTCTGGG -g
CTCTTTCCAGGCTACTAATAAAATTGCC -g TGGTTTGTTTTCATTTTTTAACTTTATGGT -g
GTACCAGGTTACCGCTGGACTTT -g TGGGTCACACTGTCTTTTAACCT -g ACACATCCTTGGACTTGGAAGAT
-g CCATAGCCATTTTCAGCCCTACT -g GGGTGAATAAAGGACCTCTTGCC -g
AGTGAGATCGCAGAGTATTTGCC -g TGCAGGAGGTGACCCA -g TGGGATCAGGCAGCTTATTTGTT -g
CTCACCGTCAAACAGCCCAT -g CACGATGTCATTCAAAGGCGATG -g TGCAGTCAATGCTCCAACTTACA -g
CTTGCTCTCTCTCCAGAACTCTT -g GGGCCGCTTGAATATGACTGTT -g CCTCACTTGGAAAAGAGCTCCA -g
CAGTGTTTCTGTCCGTAGACCC -g TGTGCTGTTGACCAGTGTTTGAT -g ATGCACAAGCTAACCTCAGAACA -g
CTATAGCAAGCCAGGACTCCAC -g TGATCCCCACTAGCTATAAAGGC -g GCATTTGAGGAAAGAGCTGTGTG -g
AGCTCACCTCTAGTGAACCCAAT -g ACTTTTTCTGTTTCTAATGTAAGCATTTTC -g
AGCCAGACAGGGTAATCTTCCTA -g AAGTTGCGTGTGTCTGTTTCCTT -g ACTGACGAAGAAGCCGAGGTA -g
TCTCCAAGTAACTGTGGGCAAAA -g GAGCGATGGAAACAGAGCAGAA -g GAAAGCAGGCAGTTTCCTTTCTG -g
GCAATTCTGGACTGGAAAATGCC -g GGAAGGGATGCTACGATATGGC -g CCTGACCATGGAGTGCCCTA -g
TTTAGCAGCATCTGAATGCACAA -g TGTGTTCAGATTTCATGTGCAGT -g AGAACAGAACAAGAACTGTAAACCT
-g AGCCCTAGAAATGAGTTCCTGAC -g AAGCTATCTTTTACTTTCTGAATAATGTTTG -g
ACAAAAAAGACTTGGGGATTGCAT
Original comment by d.vanh...@qmul.ac.uk
on 9 Mar 2012 at 10:35
Oh, I hadn’t gotten to that mail yet …
That’s a lot of adapters, wow. Never imagined anyone would ever specify so
many adapters. Regarding your trimmed read: One of your adapters matches the
read quite well:
...CCTCACAGCTGCCTGCATGGAGCTCACCTCAGCTTAGTGTGTTCCAGCCGGAGCTCCAGTTTCTTAGACACCATGTC
adapter: AGCTCCAGTTTCTTAGACACCAT
You are using the -g option, which achieves that the adapter and everything
*preceding* it will be removed from the read. Since in this case the adapter
occurs almost at the end of the read, only the three bases GTC remain.
I’m not sure whether the -g option is actually so useful, so I welcome
feedback: What would be the behavior that you expect?
Original comment by marcel.m...@tu-dortmund.de
on 9 Mar 2012 at 10:48
OK,
I had hoped that -g would only trim from the absolute 5' end of the read (i.e.
for a 20bp adapter, see if it aligns to say the 5' ~25bp allowing for the odd
indel error etc).
I see what it is doing now.
I dont want it to trim a match found in the middle of the read !!
We have a lot of PCR amplicons pooled, hence so many adapters - but your script
copes fine with this number.
Some PCR amplicons are overlapping. That's why in the example below the adapter
from one amplicon is in the middle of another amplicon.
Is it possible to make a strict 5' adapter matching and trimming option ?
thanks, david
Original comment by d.vanh...@qmul.ac.uk
on 9 Mar 2012 at 10:55
Ok, I understand. I’m quite busy with other non-cutadapt things, so I won’t
be able to properly add such an option, test and document it. But you can
easily change the behavior of the -g option yourself. Simply open the cutadapt
file and look for this section:
# Constants for the find_best_alignment function.
# The function is called with SEQ1 as the adapter, SEQ2 as the read.
BACK = align.START_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ1
FRONT = align.START_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ2 |
align.START_WITHIN_SEQ1
ANYWHERE = align.SEMIGLOBAL
Then change the FRONT line as follows:
FRONT = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1
Without the START_WITHIN_SEQ2 term, the alignment cannot start within the read.
Or more precisely: Any skipped base will count as an error.
I’ll leave this issue open until I’ve added a proper way to specify this on
the command line.
Original comment by marcel.m...@tu-dortmund.de
on 9 Mar 2012 at 11:33
Hmm, having replaced that line in the cutadapt python script with:
FRONT = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1
same result (output below).....
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAGT
GTC
+
::>
Original comment by d.vanh...@qmul.ac.uk
on 9 Mar 2012 at 3:57
Ok, one more change is needed. Find these lines (approximately line 263):
if pos >= 0:
match = AdapterMatch(0, len(self.sequence), pos, pos + len(self.sequence), len(self.sequence), 0, self)
and change the "pos >= 0" to "False":
if False:
match = AdapterMatch(0, len(self.sequence), pos, pos + len(self.sequence), len(self.sequence), 0, self)
cutadapt will be a little bit slower, though.
I will probably not be able to reply until end of March.
Original comment by marcel.m...@tu-dortmund.de
on 10 Mar 2012 at 9:55
thanks, am using v1.0 and cant find those lines (there is something similar
though):
if pos >= 0:
result = (0, len(adapter), pos, pos + len(adapter), len(adapter), 0)
i saw a reference to v1.1 but cant find a download.
david
Original comment by d.vanh...@qmul.ac.uk
on 11 Mar 2012 at 9:05
Hello, I’m back and have now looked into this again. I realize now that the
above modifications aren’t sufficient to get the behavior you wanted. I have
modified cutadapt to allow 'anchored' adapters with the -g parameter. Are you
still interested? Cutadapt 1.1 will contain those changes, but isn’t
released, yet. You can get the source code from the Subversion repository, but
I’d also be happy to send you a .tar.gz package if you’re interested in
testing the changes.
Original comment by marcel.m...@tu-dortmund.de
on 17 Apr 2012 at 9:14
Thanks Marcel,
Have in the interim found a workaround - so will wait to have a look at
cutadapt 1.1 when the new version is formally released.
thanks, david
Original comment by d.vanh...@qmul.ac.uk
on 18 Apr 2012 at 10:25
Hi Marcel,
I am also having the same scenario..
So I would like to use Cutadapt1.1 with modified 'anchored' adapters with the
-g parameter.
I just checkout the SVN repository, and still its shows verson 1.0
/home/sjohn/Install/cutadapt-read-only# ./build/scripts-2.7/cutadapt --version
1.0
So can you send me the modified version?
Regards,
Shibu
Original comment by shibujoh...@gmail.com
on 3 May 2012 at 3:07
I’ve replied to the above comment by mail (the answer is that the version
number in SVN isn’t updated, but the feature is already in).
Original comment by marcel.m...@tu-dortmund.de
on 8 May 2012 at 3:26
Original issue reported on code.google.com by
d.vanh...@qmul.ac.uk
on 23 Feb 2012 at 10:58