alexstaj / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

strict 5' adapter matching #36

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, cutadapt seems to be excessively trimming reads.
e.g. see the 3bp read remaining below.
I have specified (lots of ) multiple ~20bp adapters with -g, --times=1
I would have expected the smallest read to be 100-20=80bp.
I have not specified to trim low quality reads.

100bp miseq data. quite dirty with some low intensities, hence lower q scores.

many thanks for help.
david 

/cutadapt-1.0/cutadapt --overlap=15 --times=1 --quality-base=64 -g 
AAAAAACTCACAAAGTCAGGTAATTCT -g AAAAAAGCTCTCblah etc....

@MISEQ:6:000000000-A0EW6:1:1:15870:1573 1:N:0:CGCTATCAG
GGAGCACGTGCAGACCCCCTACCTCTGCAGGACTGTCTTGCCATCCTCACCTGTCTGTGCCTCCTGCCCCGCAGTCAAGC
GC
+
IIIIIGIG?DCGB>DBBDFGGII>FG<C4@(5=CD@CC;B>AACCBCCC?BCCCC@AAC3@>>(9>ABA@B#########
##
@MISEQ:6:000000000-A0EW6:1:1:15386:1585 1:N:0:CGCTATCAG
ACTTAAAAGTTCACTTTTTGACAGATCCTGAAAATGAGATGAAGGAGAAGCTCTTAAAAGAGTACTTAATGGTGATAG
+
IIIII@DDH?F?FCGGGGIGGHHGIIGGIIFCEHHGHG@@DDF<C@=AEDDCCCCC:@:5<C:>@CCDDDAC4:>CE>
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAG
GTC
+
::>
@MISEQ:6:000000000-A0EW6:1:1:14819:1607 1:N:0:CGCTATCAG
AGCCCGCGGCAGCCACTGCAGCAGCGGCAGTGGCAGTAGCAGCAGCCACAGCTACAGCCACAGCCACGGCCTCTGTGGCC
GC
+
GGIIIIIIF8FGGD9F;33CGDC2C/9BC>;>CDCBCCC35>?BB=A?A?BB@:>CC:8?C8?88?C59@##########
##

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by d.vanh...@qmul.ac.uk on 23 Feb 2012 at 10:58

GoogleCodeExporter commented 9 years ago
Sorry my reply has taken so long. Could you please provide the full untrimmed 
read of the one read that gets trimmed to GTC? I would also need the full 
command line in order to reproduce the problem. 

Original comment by marcel.m...@tu-dortmund.de on 9 Mar 2012 at 10:22

GoogleCodeExporter commented 9 years ago
Marcel - see also the email I sent this morning.

Here is the example I posted on the cutadapt issues page:

Running cutadapt within a perl script, within Sun Grid Engine.
$fastq is the input fastq.gz and the sample name is also in the output file.
I have cut/pasted the $adapters file below.

regards and thanks for help, david

perl script:
system ( "/data_n2/hmw208/software/cutadapt-1.0/cutadapt --overlap=15 --times=1 
--quality-base=64 $adapters $fastq1 -o ./interim/${sample}.fastq1.cutadapt.gz" 
);

input fastq.gz
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAGT
CCAGCACAAGGTTCCCACTGTACCCCTCACAGCTGCCTGCATGGAGCTCACCTCAGCTTAGTGTGTTCCAGCCGGAGCTC
CAGTTTCTTAGACACCATGTC
+
11:=DDFFHHHFHIJJJJJIJJIJJJJJIJGHHCFCHI<EAGCFC>DBHII@DHGBHII;=8==A;@;@CEH?BDDDBBC
C355:@CCCD@>AC?88@::>

after cutadapt:
@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAG
GTC
+
::>

and the $adapters file:
-g AAAAAACTCACAAAGTCAGGTAATTCT -g AAAAAAGCTCTCAGGGTTTTGCC -g 
AAAAGACAGGAGGCAGAAGGTGA -g AAAAGCCTTGGTAGGTCCGATTG -g AAACAGGCTTGAGAATCAGGGTA 
-g AAACAGTCAAAAAACAAAGAGATGGA -g AAACTCCAGGTCCTCTGGTTGG -g 
AAACTCTGTTCCAGATCCCTTCC -g AAAGCACCTCAAGGCCCAAG -g AAAGCAGTGGATCACAGGAACAT -g 
AAATAAGAAAATCATAGATACTGCAAAACTATTC -g AAATGCTGCAAATGGGTGTCAAT -g 
AAATGTGAGCATCCTGGTGAGTT -g AACCCGGGGCCAAGTTC -g AAGACGTCAAACCACCCTTCC -g 
AAGAGCACTGGTTGTTAGCACTT -g AAGCTTTGGCTGAATCTGTTTTAT -g AAGGATTTTGATGCTTTCGTATCC 
-g AAGGGGATTAGTGCTTGGTTGTC -g AAGGTACTGCTCAAACACCAAGC -g 
AATAAAAACAGAGACCAGCCCAC -g AATAGAGAACCTCTAACATGGTGAATAAGAT -g 
AATCAGGCAACTCAGCACACATA -g AATCATTGAGCGTAATATCATCTTGG -g 
AATGACCGAGGGGTAGTCATTCT -g AATGCACTCATGTCAAGAATAAGC -g AATGCCAAGTGCAACGGCTA -g 
AATGCTTTCAGCCTTATGCCTTG -g AATGGAGAGGGGAAAGCTTCTTG -g AATGTTCTGCTCAGCACCCAA -g 
AATTTAAAGGATCTTGAGAAAAACAGGT -g ACAAAGTTTTCCACCTTCTCACA -g 
ACAAATTGCCTGAAGGAACACAC -g ACAATAGGTAGCAAACCATACATTCA -g 
ACAGCAGAAGGGTGCTAAATCTT -g ACAGGCAGTTCTTTACAAGTCTCA -g ACATAAGCAGGCTTCTAAAATGGC 
-g ACATATGATGCTTTTGTGTCTTACCT -g ACATTCTATAGCTTCTACTTGGGCTT -g 
ACCAAGTCATTCACATAATTTTTCAGC -g ACCACCACATAGATGAATAAGCA -g 
ACCAGAGTCCTGACTAGAAATGG -g ACCCATTGTCTGAATTGTTTTATACCT -g 
ACCCCTATGATTAATGTAGCACTGTC -g ACCGAATCGTAGTGGATGAAGTTT -g 
ACCTGAAATTCATCCTTGAGATGTG -g ACCTGAGGCAAATCCACGTC -g ACCTGGTATTTACAAAGCTGAAGAA 
-g ACGATCAAAGTCACGTCACAGAG -g ACGTCCTCTTCAATGGAAAGATCC -g 
ACGTTGATATTGCTGATTAAGTCCCT -g ACTCAATTCAGACAACTAGTATCTAAGG -g 
ACTCCATGAAGCATTGGTGGAAA -g ACTGAGTATTCATAAAATTTGACTTCAGC -g 
ACTGGAGAACAATAAAAACCCAACG -g ACTGGGAGGGTTTGATAACTTGAC -g 
ACTGTACAAACAAAAACAGGAGCA -g ACTGTCAGCATCTCTGTATCGGT -g 
ACTGTTTCTTTTTGTGTTTGACAGC -g ACTTACATTAATTCCATTCAAAATCATCTGT -g 
ACTTACCTATGGCCTTGTTTAGTAGAAT -g ACTTCACCTGCAAAATGGGCATA -g 
ACTTGCCCACTTACTGGATTTCAT -g ACTTGGCAAACAGATGGTGAGAT -g AGAACTGGAATGATGAATGGGACA 
-g AGAATCCTGCCAACTTCCACAAT -g AGACATGGCTTTGACATTAGTTCCA -g 
AGACCCAAGATGATTAAGTATGATCCA -g AGACTGGCATGAACTTTTCCCTA -g 
AGAGGTCCTACATCTTCTGCCAA -g AGATAAAGGGCCTAGAGAGTGGG -g AGATCAACACACCATACACTTCCA 
-g AGCAATTAAAGGCCAAAACTAAGGA -g AGCATCTCTTGGTCATCTGTTGG -g 
AGCCAGTAAAGGTATTGGAGAGT -g AGCCTGCTGTGGTACTGG -g AGCCTGTTTAACTTTACTGCCAA -g 
AGCTAAGATTCTTGATGCCTGGT -g AGCTCCAGTTTCTTAGACACCAT -g AGGAAGGAGACATTTAGGTAACGG 
-g AGGAATCTTCCATTCAACAATCTCC -g AGGACCTCTCTGACTCAAGGTTT -g 
AGGACCTTCCTAATTCTTTTTAGATTGT -g AGGACGCAGAACAAGGAAAAGTG -g 
AGGAGTCGTATTAAAGTCAGGCTA -g AGGCTCTGCTTTGTTTTCTGTTG -g AGGCTGACATTTGTATAAGGTGG 
-g AGGGCTGAAAATGGCTATGGAAT -g AGGGCTGGGTGATTGGGAT -g 
AGGTATGTATTTCATAAAGTTTTTCTGTGG -g AGGTGAAGCATGCAGAAATAACTA -g 
AGGTGACTCTTTCTAGCATGTGA -g AGGTGTTTCAAAAAACAGTAACATCA -g 
AGTACAGGGAGTACAGGATACAT -g AGTCATTTCCTGGCAGAAGTCC -g AGTGGAATGGATATGAAAGCATACCT 
-g AGTGGTAAGACAACTCAATTTTCCC -g AGTTAAGCCCCAGACAAATGC -g 
AGTTACCTTGTCATTTTGGTTTTTGTTT -g AGTTTCAAAAATCAGAACTTTAGTTGCC -g 
AGTTTTACAGTGAACTGAATTAGGGT -g ATAAGCTAGAATTCCATTTTCTAATGTGT -g 
ATAAGGCAAGTGTGGAAAGGACA -g ATAAGTACATAAAAGCAATCCATAGCC -g ATCCACCGCCAGCTCCTAA 
-g ATCCATCGTCTCCTCCTCTTCAT -g ATCCGGTTGATCACATCACAGTC -g ATCCTGGGCAAGCTGGG -g 
ATCTCATGTTCCCCCTTTCAGTT -g ATGCATGCCACTTCTCAGTACAT -g ATGCATGCTTTCAGTTGATTCC -g 
ATGTCTGTGGTTGATAGCCCAAG -g ATGTGACGTTGGCAACATTGAAC -g ATGTGGGTGCAGGGTAGG -g 
ATGTTTGGCTTGCTGTTCCTCA -g ATTACTGTGGTTGAAGGGGAACC -g ATTCTAAGGCCCCTCTTTCAACC -g 
ATTGCCATGAGCCTGTGTCC -g ATTTTTATAAGCCAAACCCTCCTTTT -g CAAAAGCCGATGGTGTGGAAG -g 
CAAAATGCGCTACCACATGCC -g CAACACCCTCTCTTTCAGCCAT -g CAAGATACGAGTGGAACCTGGAA -g 
CAAGCATGAATGGATGGGTGAAG -g CAAGGAAACAGACACACGCAACT -g CAAGTGCCTGTCTGCATTCTACT 
-g CACACACAAAGCGGTACACGTAG -g CACACACACACACACTCAAAAGG -g 
CACACAGCGACCTGACCTTTAAC -g CACACCCATTTTCCTGCACGATT -g CACAGCCCAACTATGGAAACCAG 
-g CACAGCCGCAGCAATGG -g CACAGCGATACAAATCCTGGTCA -g CACCAACCCCCACGTGTTT -g 
CACCAGCGTTCCAAGTCAGAT -g CACCCACAGAAACGTGCAGATG -g CACCCCTCCATACAACAAGGTTT -g 
CACCGATACACACTGGAAATGTT -g CACCTCTACAAAACCTCCTTCCA -g CACCTGCCCTTCATGGGTAGTAA 
-g CACGAAGCGAAGGTCGTTGA -g CACGGCGCAGGCGAAG -g CACGGTTGCTCCTTTCTTTCTCT -g 
CACTTCTCTGCCCACAGGTC -g CACTTTCAATGCATGTGGCATTTT -g CAGAGACCGGGGAAGATTTGAAG -g 
CAGAGGAAGTTGGGGCTGTC -g CAGCACGTCCACCATCGAG -g CAGCAGTGAATCTTGCCTTGGAT -g 
CAGCATCAGAGTTTATTCACTGC -g CAGCCATTCTTATTTCATTTGTGCT -g CAGCCTCCCAGCCACC -g 
CAGCGCCTGCCAAAAATATTCAC -g CAGCTAGGGACCCAATATGTGTT -g CAGCTGCACGAGGAAGTGG -g 
CAGGATACAGGAAGCACAAGGAG -g CAGGCGTATAACAGCCAATTCAT -g CAGGGAAGACCTCCTCTGGAAAT 
-g CAGGGAGTGGCTCTTTGCTG -g CAGTCAGGGCCTCCATTACATC -g CAGTGAGACCACTGCCATGAAG -g 
CATACCTGCCCGTTGAACATGAC -g CATACTTGGGCATCTTGGGATGT -g CATCCATCCAGCCATGATCCC -g 
CATCCTTCCTGGCAAGTGGTC -g CATCTCCCAATGCCTGTCCTG -g CATGGCCCCTTCTGTCTCC -g 
CATGGCCTGTCTTGAGTCTGTG -g CATGTTGGGTCCAATGCAGATG -g CATTGACTGCATGGGTTTCTGTC -g 
CATTGCCTTCCAAGACCCAAAC -g CATTTTGTACCCTTGGTGACCCT -g CCAAAACCTTAGGAAGGGTGTTCT 
-g CCAAAATGACTGACATAAACCCCAT -g CCAAGGTCTCCAACAAGCTCAAG -g 
CCACAGAAATCGACTCTCACCAA -g CCACAGGAGACTCAGGGGAAG -g CCACCATGGAGAACTGGTAGGAG -g 
CCACTCACCTCCCCCAACTA -g CCAGAAGGGAACGGCGAAA -g CCAGAAGTCTTTTACAGTGTTTGGC -g 
CCAGATCCGATTTTGGAGACCT -g CCAGCACAAGGTTCCCACT -g CCAGCCAACCACAATGACGAG -g 
CCAGGAGAGCAAATAAAGTAATGCC -g CCAGGTGAAAGTTATTCCTCCG -g CCATCACAATGCAAGCAGATTGTT 
-g CCATCACTGGTCTTGAAGGTTGTTA -g CCATCTCCACTCCAAAGTTAGACA -g 
CCATGCTAGGGGCAAAGAGG -g CCATGGACAGCCCCCTG -g CCCAGAATAAGAATGCACCGAGG -g 
CCCAGAATCCCATGGAACCTT -g CCCAGGAAACAAGATTTTGGTGA -g CCCCAGGAGTATGCCCTTTTC -g 
CCCCAGTCGTTGCTGTTCTTTAG -g CCCCCCATTATCCTAGAGTGAAA -g CCCCCTCAAACTGAAAGCAGTAA 
-g CCCGAGTATTTTGCATGTCCAAC -g CCCTAGGCTCACCTAAGGAAGT -g CCCTTCCAGGAGTTTTCTCTCAT 
-g CCCTTTCCCCTCTTCATTTATCCC -g CCGCCTGCTGACTTCACTC -g CCGGGGAGGGAGGTGT -g 
CCGGGTCTCGGAGGAGG -g CCTATTGCTGCCATTTTCCAATGT -g CCTCAACTCCCCACAGTTTGC -g 
CCTCAGTTCCCCTCAGATGC -g CCTCCTGGACCACCTCAGTTT -g CCTCCTTCCTTACCTCATCTGGG -g 
CCTCTCAAGGGTAACGAACAGAG -g CCTCTCGGGGAGAAGCC -g CCTGCACATTTTGTTCTGGTGAT -g 
CCTGGAAAATTCATAGGCAGTTC -g CCTGGTAGCATTCCATGGCTC -g CCTGGTTCCTGATGTTCTGTGG -g 
CCTGGTTGTGTTGGGTTTCATTT -g CCTTAAAGACCACAGCGAGGAG -g CCTTGACCTTGTTTCTGCTAACAT 
-g CCTTTCCAGGGTGAATGTCCTAA -g CGAAATGCAATGGGGTTGAGAAT -g 
CGAAGGAAACAAACCCAATTCCC -g CGCAGGAGGAGGGTTCTTATAGT -g CGCCGTCGGACATCATGAATAA -g 
CGCCTTGTCGATTTTGGAATGTC -g CGCGAGCTGGAGCACTA -g CGGGAACCCTGGAGGGAC -g 
CGTGAGTAACTAAGAAGCAAATCTGG -g CTAGAAGCTCACAGACAAGCAGT -g CTATTGAGTCCCCACCACCTTC 
-g CTCACACACTCCAGGTGAGAAAA -g CTCACCCTGCTTTATGATGAGGT -g CTCAGCCCGAAGCCAATCTCTA 
-g CTCATGCTCCCCACCGTCAT -g CTCCAAACTGGGATCACTAACCA -g CTCCCGCAGCTCAGACG -g 
CTCCCTGAGGCTGAGTGAACA -g CTCCTAGGCCCTGAATTTCAACA -g CTCCTCCCCTCTTGCTTTCAG -g 
CTCCTTCTTGTGACAAAGGCAAA -g CTCTGCCAGCATATGGAGTTGAT -g CTCTGTCCGTCCCTCTCTTCT -g 
CTGACCCCACCTTCATCTTCTTC -g CTGACTGAAGAACGCCTGGATG -g CTGAGACATGTACTGGCTTCACT -g 
CTGAGGCACATCTAGGCAGTTTC -g CTGATAGGAGCACCTTTGGCTTG -g CTGATGGCATCCCCTCCAAG -g 
CTGCCTGTCTCAAGCTGCAC -g CTGGAGGTGCTTTTATGCCCAA -g CTGGCAAAGGAATGAAGTTATTGGA -g 
CTGGCTACAGGTGTGTGTGTATG -g CTGGCTCAATAGATGGACAGGTT -g CTGGCTGGCTTGTGAAGACTAT -g 
CTGGTCCGAGAGAGAATCCTGA -g CTGGTTCTGGTGGAACTTGGAG -g CTTACACAGCGCCGTAGCC -g 
CTTAGCTGGTGTCTCCCTGCTT -g CTTCACTCTGCTGAAGGCATCTC -g CTTCATAAGAGGGTGGGCTTCC -g 
CTTCCTTCTCTGCTACGGGGATG -g CTTGTATCTGGGGTACTGCGTTA -g CTTTGCCCCTACCTCCTTGG -g 
CTTTTGGTCTGGGGCAGATTGG -g GAAAACAACCGAGAGCCTGAAT -g GAACAGAGACCTTAAGCAGGAGG -g 
GAAGCTGGTCGAGCCCATT -g GAAGGTCTCAGCAGGGTTTCC -g GAATGCAAGCCACAATGAGAGAA -g 
GAATGCAGCCCTCATCACATTTC -g GACACACCACTCTCTGGGATTTT -g GACACAGAAGGAGACAGAAAGCA 
-g GACACGACTCACTGTCTGCTTC -g GACCGGCAGATGAACACCAA -g GACCTAGCTGCTGTCCTTCC -g 
GACCTTGGCCCACAAGTTCTAC -g GACGGTCCTCAGCGGGAG -g GACGTGGTCCCTGGTAAAGTC -g 
GAGCCCTATTCCAGTACAAACCC -g GAGCCTTCCTAATCCCTACGAAA -g GAGCGAGGGCAGACTTTGATT -g 
GAGCTGCTATTAGTCCCATCTGT -g GAGCTGGGTCTGCAGGAAG -g GAGGACAGGGCAATACCGATG -g 
GATACTGCTCCACAGAGACTGC -g GATAGTCCTCTGAGTCGAAGCTG -g GATCCAGTGACAACCTCTCCTTT -g 
GATCCTGGGAGCAGTCTAGGG -g GATGAAGCTAGAAGGCAGAAGGG -g GATTCACACACCACCTTTGCATC -g 
GCAATATGCGGAAAGCTGTGAAG -g GCACACCTCATCTGAACTTCATTAC -g 
GCACTACTCACCTGTTAAGGAAAA -g GCAGCACAACGGATTTTGTGAA -g GCAGGTTATCGTGTGAAGGAGG -g 
GCATCATCAAAACAGAACTTGGC -g GCATGACAACTGCTTTGGTCTTC -g GCATGTCCTCTGAGACAAATGAG 
-g GCCAGCTCGAGAAAGCAGTC -g GCCAGGTGACTGAAGTCTGTG -g GCCATTCTCAGTTATCCAACAACA -g 
GCCCAATGCCTGCACTAC -g GCCTAGTGACTGTGTGTGTCATTT -g GCCTCATCTTTCAGCCTGTAGTTA -g 
GCCTCTTCATTTTCAGGAAAGCC -g GCCTTACCTGGAGCCATAACTTG -g GCGCCCAGAACAAATTGTAGTAA 
-g GCGTCTTGAGTTGTCCAAGGTC -g GCTCAAGTGGAGTGCAGTTATTT -g GCTGAAGAGAGCTGGAGACAG 
-g GCTTACATCTATTAATGCTGACCAATTCT -g GGAAAATCTTAAAGTAATCACATTTTTCTTGT -g 
GGAACTGCTCCAGGACAGAGAA -g GGAACTTCCTGAGTTGCCAT -g GGAAGTGGCTCAGTGGAAATGTA -g 
GGACTCCAGGCGTTCGT -g GGAGAAATCAATGGTTCTGCCAA -g GGAGAGGAACTCAGGAAGCAG -g 
GGCAAATACTCTGCGATCTCACT -g GGCAATGACCGATGACTTTGATG -g GGCACCTCCAGTGGAAATCAAG -g 
GGCATCTCGGGTGGTAGATAATG -g GGCCACTATCCCTCTGGGT -g GGCCCTGTGCCTTAGTAGTATTT -g 
GGCCTGAACATGGTCTTGGT -g GGCTCCTGCACAGTGACTAC -g GGCTGCTTCTGGGTGGG -g 
GGCTGGACTCCTGGCAAAG -g GGCTGGGTGGGCTTTCTTC -g GGCTTCTGATGGCTTTTCTGC -g 
GGCTTGTGGTGCTGAGTGA -g GGGCAACACGACCCTCAAC -g GGGGAAATGTTTTTAGCCCATCT -g 
GGGGCTGGATCTTTTCCCAC -g GGGGTGACGACTTCTTGTTTGAT -g GGGTCCTGACAACACCTCTTTTA -g 
GGGTGCCAGGTGCATTATCAA -g GGTAAATATCTTTTGGCCCTTGCC -g GGTACGTGGTTCCAGGGTG -g 
GGTCAGGCCATAGAGGCAGA -g GGTCCCGCACATAGTCCTTGA -g GGTGAAAGACTGATGTCTGCTG -g 
GGTGAAGCAGTCGCCCAT -g GGTGCAAGAGGCCAGCAT -g GGTGGGAGGCACACTGG -g 
GGTTAGCGAGCCTCACTTTAGTT -g GGTTGAGAAACGGACACAGCTA -g GGTTTGTCACCCGGCTCTC -g 
GTACCCCTCTCAGCCCCTC -g GTACTTACAGGCACCGTTTTCCT -g GTAGGAATAGGAAGAGCAGGAGC -g 
GTCAGCGGAGAGGTCCC -g GTCAGGCTCATCTCCAGACAG -g GTCATAGGCCGTTGGGAACTC -g 
GTCATCGGCGCTCAGAATAGG -g GTCCAGGTTCCAGAACACCATTC -g GTCCCGGACATTCTCAAGTACA -g 
GTCTCCTCCCAGAGGTACAATTC -g GTGAAGTTGGACACCTCCTTCTG -g GTGACGAAGGCCGGAGC -g 
GTGAGCAAGTCCCACCTACAG -g GTGATTCTCCAGGATGTGCCATT -g GTGCCAGGATTCAGTCCTTTCTG -g 
GTGCTCCAGGGAGTCATCA -g GTGCTCCTGTACATCCTGCTTG -g GTGGCGCTGTAGGGGAAG -g 
GTGGCGGGAGGTAGGTAT -g GTGGCTGCAGTTGCTCACTATTT -g GTGGTTTCCTTCAGGTGCTTCTA -g 
GTGTGGCAGAGCATGATGAACAA -g GTTATTTAAAATGCCTGAGGGCCAA -g GTTCCCCTATGACTCTGTCCCTT 
-g GTTGACCAGCAGGGACACG -g GTTGTAACCCAAGGATTCCCAGT -g 
GTTTCCAGAAAAATTCTATAGGAGAAAACA -g TAAATGCTCCAGTTGTAGCTGTGT -g 
TACAGGACTCCAGAAGGCAAATG -g TACAGGCCAGACCTGAATTTCCC -g TACCCTTCACTTTTTGACCCCAG 
-g TACTCACAGGCAGTGGATAATCGT -g TAGAGGAAAGTCCTGTACATCGG -g 
TAGCCAACATCATCTTCAGAGGC -g TAGCGAGGTACGTCTAGAAGGC -g TATGCAAAGTCAGGGTGGATGTT -g 
TATGCCATGAGTGCTCAGAGAGG -g TCAAGGTGATAACAAGATCACAAGG -g 
TCAATATAGTTGGCATTAATATAATCTGAGGA -g TCAATCCAAAGGGGAAGAATGTGT -g 
TCACAGAGTTCTGAAGTGGAAGG -g TCACATCTGATTCCCTATGTGTG -g TCACATGGATAGAGCATCAGACA 
-g TCACCAAGAATTGAATGGGAACT -g TCACGAAGGTTTCGTTTTCTCCT -g 
TCACTGAGATAGGGGTCTTTTCCT -g TCACTGTCACCGCAGATAATTGA -g 
TCAGAAATGGAAATTAAATGTCATCAGAA -g TCAGCCAGTTTTCTGTAAGTCAAT -g TCAGGGCCGTGGGTGA 
-g TCAGTAAATGGAACTGATAAAATCTGAC -g TCAGTCTAGCCACAGACCTGAAC -g 
TCAGTGACATTCTACCTCCAGAA -g TCAGTGGAGGGACGTGGT -g TCAGTTGGAGACATTCTGAAGCA -g 
TCATAAAGAAGCTCAAGGCAACAC -g TCATATTCAAGCGGCCCACAG -g 
TCATGCTAATGTCTTTCTTATTGGATTT -g TCATTTTCAGACATTTGCGTGGT -g 
TCCACCTCTCTCTGTTTCCACAT -g TCCACTTGGAATACAAAGAAATGACT -g 
TCCAGAATCACAACTTTGTTGCC -g TCCAGGATGTTACCAGGACATTT -g TCCATTGGGCCAACAGATGC -g 
TCCCCCTCTTGCTTTATCTCCTC -g TCCCTCTGGATTGGTTAGGATCA -g TCCTCCCAGATGTAGGACAAAGT 
-g TCCTCTCTCCCCGCAGTC -g TCCTCTTTGAACCCATAGCTGTC -g TCCTGCCTCCTAAAACAAACC -g 
TCCTTCCTGCAGCCTTGTTG -g TCCTTTTGGTCTTCAGGTTGGAT -g TCCTTTTTCCCTTTCTTTTCTAGCAG 
-g TCGTGTAGAATTCAAAGTGAAATGAGT -g TCTATACTCATTACTTACGACATTCACT -g 
TCTCAGTTATGTGTACTCTGCTCT -g TCTCCCTAAGAAATGTGAGCAATAGT -g 
TCTCCCTTTATACATTTTGCTCCT -g TCTCGATGAGAAGCCCATCCT -g TCTCTGACCCTTTTCCCAAGAAT -g 
TCTGCTTTGCTGAATGAAGAGGA -g TCTGGGAAATTCTCAAAACCCCAT -g TCTGTCCATTGCACCTTTGTCAT 
-g TCTTCCTTTTAGTCCTTCAATCCAC -g TGAAAATGAGAAAATAAGCACCAAGG -g 
TGAAGCAGGGAACAAAGTCCTTC -g TGAATAGTTGTGCCCACAGTCTTA -g TGACAACTAGGAGAAGGAGGATGA 
-g TGACCTCTCGGGCATCCTG -g TGAGACCTGAAGAGTCAGAAAGAT -g TGAGAGACATCTACTCTTTGTTTGC 
-g TGAGCGGCCCTCCCAT -g TGAGGCTGTTGCTAGAGTTATCTT -g TGAGTGTCTTACATGTTGCTTCT -g 
TGATAACAGCTGAAGAAACCAGT -g TGATGCTTTGACAAAAGGTAATCCA -g TGATGGACAACACTGGTAAGATT 
-g TGATTTTCAGGTTGTTCCACAAAAA -g TGCAGCCAATGTTGAAGCAAATC -g 
TGCATCCTGAGAATGAGTGTGTC -g TGCATGTGCTTATCAAACTTCAAA -g TGCATTATGTAGCTTTCCACAACA 
-g TGCATTTTCAGCTGTTGTTGATGA -g TGCCATCAGAGGCTAGAATTATGA -g 
TGCTAAATTCCACTGAGTATCATAGTTAAAA -g TGCTAACTGCAAAAGAGACTGAC -g 
TGCTAAGAAGTTTGGAATCAGGT -g TGCTACTTTCCATTCTTTTTTCCTTT -g 
TGCTCCAATATGAGGTTATTTTCCA -g TGCTCTACTTCCTGAAGACCTGA -g TGCTGGTCTGTAGGAGATGGTAT 
-g TGCTGTGGAAGTCTGTGTTGTA -g TGCTTAAGAGAAAAAGAAAGAAGAAAACCA -g 
TGCTTTTGTTCTAATAAAGCCATGC -g TGGAACCCTAGCCCATCGTC -g TGGAGGTGTTCTTGTTTTCCACA -g 
TGGCAATAAAACAAATCGAATAGCA -g TGGCTGAGTGACACTGAAGTTTT -g TGGCTTTGGAGCTAGTCTGAAAT 
-g TGGGAATGTCAGGATATAGGTACT -g TGGTCCATGAAGGTGATGGAAAG -g 
TGGTGCATAGTTATATGGGGACA -g TGGTTACTAAAGGGAAGATGGGTG -g 
TGTAAAATCTTGTGTGTGATTTTGTGT -g TGTAATTCAAAACGAAATTAGAAAACTTACTTG -g 
TGTACTGTATAATACGACTTCACATCTT -g TGTAGCCTGTTGTGTCAAGATGATT -g 
TGTAGTTACTGCTCTCCCCATCA -g TGTATCAACCATTTTTAGTATTGCCTT -g 
TGTATCACCTTCTGCCAAGTTCC -g TGTATTACACATATAGTGAGAACTGATGAC -g 
TGTCACTTTTGCTTTGTTTTCTTTCT -g TGTCAGCATCCCAACCAGAG -g 
TGTGAAAACAGTTACTATCAAACACTGG -g TGTGCTACCCTCACTCTTGGA -g 
TGTGGCTGTGCACCTAAATCTTC -g TGTGTCTTATGCTAGATTCCTTCAC -g TGTTGTGTATCCTCAAAGGGGAA 
-g TGTTTCCTCTACTGAGGGGTGTT -g TTAAAATCAATTTAGTCAGGATCCCAAA -g 
TTAAACTGCATGTGCCTTGCTTC -g TTACACTTTTATTCCATAGAAATTATACTCAGAAA -g 
TTACATTAGTCCTGGTCACTTCAG -g TTATTTCTTTCACATTTTTCTCTGATGTTC -g 
TTCAGTCGTCTCTGACTGATGC -g TTCCCTCTAGAATGAGAGTTGCC -g TTCCTGCAGTGTCAAAACTTCCT -g 
TTCCTTGTTCTCCCAATACCTCA -g TTCTCTCCGAGTGCTTTCCAAAT -g TTCTGAAAACCTTTGCTGGGTCT 
-g TTCTGATGAGAGGTGTGGAAACA -g TTGAGCCCCAACTATGTCAATGG -g 
TTGGCATATTGGAATTGGTCAGC -g TTGGGGAATACTCGTTTTTCAGC -g TTGGTCTTGGGAGTTTTGTGGAA 
-g TTGTAGTCTCCCTGTTGTCTGAA -g TTGTGCTCAAAAGCCTTGTCTTG -g 
TTTAATTATCTCCCCACCCACCC -g TTTCAGGGAAACACCTTTGTCCA -g TTTCCCAAACCCCATAATGCTCA 
-g TTTCTCTCAACACCGCATACACA -g TTTGTGGAAGGTGGTTTCCTCTT -g 
TTTTCCAAGTGAGGCCACTTCAT -g AATGACTACCCCTCGGTCATTCT -g CCTCCCAGTGGTTTCTGTTCTAC 
-g CAGGTGGACAGCCTCTTTCAG -g TCAGATTGTTGTGGAAGAACTGC -g TGCCTTTGTCACAAGAAGGAGAA 
-g GTGAAGAAACCTCTTCCCTTCCC -g TCACTTGGCAGGAAATTGGGAAA -g 
CTGCCTCTCCACTCTGAGGTTAT -g GGCATTGCCAAAATGATCACAGG -g AAAGGACAAGAGGGCTGTGAAAT 
-g CTCCTAGGTGGCTGTACTTTTGA -g TGGAGTTCTGGACAAAGCAAACA -g 
ACACAGCTACAACTGGAGCATTTA -g CGAGAGCCGGGTGACAAAC -g CTCTCAACTCTCTGCCCCTTCT -g 
GTGGGGAAAGACTCGATAGGTTT -g ACTTTGACATTCTTCCTCAATTACTCA -g 
CATGTTGGACTTACCTCCTCCTG -g ATGAGCTGCGCGTCCTC -g CCATGTCTGGACCCTTCTCTC -g 
CGGAAGGTGGAGCGTCAT -g TTTTTCTCTCCTAGGTAAAACCGATAACC -g CTCACACCCTCCTGTCCTTGAT 
-g TTAGGAAAACTCCTCCCCTCTCA -g TCCTCTCTTTACTTGCAAACCCT -g 
ATTGACACTCTCTGCTGTTTGTC -g GAGCTTATCTTTCCTCTCCTGCT -g AGGCAAGTAAATTCCACCCACTT 
-g CGTGGTCCCGGAAATAAGAGAAT -g TCAGAGAGGAATCAACTGAAAGCAT -g 
GCAGACAGACAGAGTGGTGAAAA -g TGCCCCTCTCACATTCATTCATT -g ACATGATGCGGAGATATGAAAGC 
-g ATATGCTGATGAGACAGCAACCA -g TGAATGAAGAATGGCTGGAAGGG -g TCTCCGAGTCCGTGCAGAG -g 
AGCTGAGTCATATAGAAAAATATTGGTG -g TGAATATTTTTCTATCTGGCATTCCCT -g 
ACTCTTTTCTTTCGGAATGCCTCT -g GACCAAGGTCTAGCTCTACTGTT -g ATCTTAACAGGTGGTTGTCTTCA 
-g TTCCATTGAAGAGGACGTGTTGG -g CTTTGGTTGCTGTTGGAGATTGG -g 
TGAGAATGAACGAAAAAGAAAAAGGTG -g GGAAAAACTTACTAAGCCAAAAGAAGA -g 
AAATTTTCTTTTTCATTCTTTTGGTAGGAA -g CGGCCGCTAGTGCTCC -g TGAAAACCAGTAGTTTCCTCAAAGA 
-g GTGGTGACTCAGCCTTTCTTTCT -g TGGTGAATGTTCTGGCCTGTG -g 
ATTCTGACCTCTATGTAAACTGAGCTG -g CCATGGATGATACAAGAGAGGATATT -g 
TGACTGTTTTTGCAGGCATTAAGT -g CCCCGTTCACTGTGGGT -g TACGTCTGTTTCTTCCCAATTCT -g 
TTGTGTTTAATGGCTTTATTTTGGC -g CTGCATTGTGCTGGGGATCTG -g AACACTTCTGGCAGTATCCTTCA 
-g AGAATTGCCTCAGCTCTTGGAAA -g CACAATGTACAGGATGCAACTCC -g 
GGACGGGGATTTCTATCACCATTA -g GACTTCGGGGTGAACTTGTCTC -g AAGGCTGCACATCCTATGGGTAA 
-g CAGCAAGAACTCTGAGGTGAACA -g GATCCGTGATCTGCAGGCAT -g ACAACCTCTCTGAAACCTGGACA 
-g TCTTCTTGGTTTTACAGGAATGTTTCA -g CGGAATACTCACAGGATGCATTT -g 
CAGGTGAAGACACTGGTGGAAT -g ATTGTCGTACGCTACAAGCATGG -g AGATGGAATGTACAACCAAGTGC -g 
GGCTGTTGCTAAAGGATAAAATGCC -g CCTAACACATGCTGCCATTTCTG -g 
AGTGACTTCTTAGTTTTGGGTACTTGA -g GAAGCCACAAGTTTCTGTGAGCA -g 
GCAGATGGAATCATCTAGGAAGGT -g TTGTGGAAACTTTGTTGCTGCTC -g GCTAACTGGTTTCTTCAGCTGTT 
-g TGTTGAGAGAAAGCCTCAGACAC -g AGAGTGCTGCGGTCCA -g CATAAAGCAGGGTGAGCAAATCA -g 
CAGTGTTTGATCATCCAAGGCAA -g ACATTTTACAAAGTTGTTTTCTGTTGTG -g 
TGATTTTGCAACCCGTAGAGGAT -g TGATGAATGATCCCATTAGTAGGTTT -g 
TGGCCAGTTGATACCACTAAAGG -g GCTGAAATACAGGCAAGACGGAT -g TGGTGTTTCCATCATCAAGGCTA 
-g ATGGTCTAGAAGCACACTGTTGG -g CTGTCCACGAACTCACAGTATCC -g 
CATTGTTGTGCTTTCTTTCACACC -g AGTGGAAAAACTATCTTTGATTTTCATTTT -g 
GGGATAAGAGTCCTTTTCCATAAGAGT -g TTCTGAGGATGTTGCAAAGGACA -g CCTGCTGCTTGGAGCC -g 
CTGTGCGCAGTATTTATCCCTTG -g CGTGTTTCCAAAGAGAAAAACCT -g 
TGAAGAAGATACCATAAAAACAATGCTC -g TCCACTGCTTTTGTACCAGCTTT -g 
ACACATACATACATAATTCTCATTTCTTTCT -g GTGGATAGCATCCTGTTCCTCTG -g 
TCAGGGACTTGTTTTGGTGAGAG -g CCCTCCACTGTAGAAATTCCTTGAAA -g TCTTCACAGAGCTACAGGCAAC 
-g CCGTGGCGCCTCATCG -g GAGGTCATCCGCTGTGCTG -g AGTAATGAATTGCTAAAATCTGCTTTATACT 
-g CGCGGGTCCTGGAAGGTA -g ATGCTTCATGGAGTCCACTTTCA -g GAGTATCATAGCAAAGCCCCGTT -g 
GCCTGACCTAGTCATTCTTGAGT -g TTTCATCTGTGTGTGGCAGGG -g GCATCCTCTGTCCTGGGTAGT -g 
CTCAGGATTCTCTCTCGGACCA -g GGTCACTTGGTGCTGCTATCTAA -g CAACTGGCCAAAACTCCAAAACA -g 
ATGGCAGGCAATGACGAGAACTA -g ACCATGTACACCAGAAACACCAG -g GTCATCTTCTTGGGGACACTGG -g 
GTCAACTCCAGCCGCTACAAG -g CTGTCCAGCAACATCTCTATCCC -g GCTGTGCCTTGCAGTTTGTTAAAT -g 
AACAATAACTGGAAAAGCCCCAG -g CATGGGAACATTCCTGGGTCTG -g AGTCCATCAGATGCTACCAGAGG -g 
GTCGTCTAGAAGTGAATGAGTGGT -g GGCCTGTCTTGTAGATTTTTGCC -g CGTCTAGGGGATGTAGAGGTGAA 
-g CCTCACACCCACCCCTTCT -g CCCAGCCGAGAAATTAAAAAGCA -g CTCACCTTGTAATACTGCCCCAT -g 
CTGCGAGCAGACGGAGAG -g CAGATATTGCTTGCTAAGATGCTTT -g CCATACTGGCACATTCCTTCTGC -g 
AGTGTGCAAAAGCCCTTGTTTTC -g GACAGCTTCGACTCAGAGGACTA -g CTCCGGCCTGACCCTCTAA -g 
CCAGGCAAAAGAAACACAACAGA -g GTTTCATTCCCCTCATGACAAGC -g 
TGGATCATACTTAATCATCTTGGGTCT -g GGTCACCCTCTTCCTTTCTAAGC -g CCCCAGGTGAAAGGACGTG 
-g TGATAATTTACTTGTGACCTGTGTCATT -g ATACATGCTCTGCTGGATGTAGG -g ATCCCCGAGGGCGTCAT 
-g TGACACAGGAGAGAGCTGAACAT -g CGCTTCACACCTCCCTCC -g GTTATGCGTATTCCCGTAGACCC -g 
CCCAGAACCTAGAATTGGGAGGA -g AGTGAGAGAATCTGGCTCCTTGA -g TTTAGCTGAACCCAGAAACATTCA 
-g TCCTTAGGCAGACTTGAGTTTCT -g CCCCCATGTATGTGTATGAGTCC -g 
AGGCTGAAATTGCTTTTCACATTCT -g GGCACTTCAGAGACACCTTGAT -g CAACGCCTTCTCTCACACAGG -g 
TAGGTCTGCTGGGGCATCC -g CACAGCCCGATAGGAGATACTCA -g TGGAAGCCCACCCTCTTATGAA -g 
CAACTTACATCCAGCGAAGGCTC -g GCAGATCCCATTAAATGTCCTGGT -g GGGCTGGATCTTGTGTCTATCCT 
-g CTCATTAATCACTCATCAGCTCCA -g CCACTAGACTGCTGTATTTTCTCCC -g 
GAGCGTGATCTTCTCCTCCAG -g CAAGAGTGTCTGCCCTCGG -g GGAAGGGTGGTTTGACGTCTT -g 
CTCTGGGATGTGTTGGTTACTCC -g AGCTTGCTACACTGAAGACCATT -g TTGAAGTGCACACTCATCTCTCC 
-g CCTTCGCGACACACCAGAT -g AACCAAGCAAGTCACAGAACAAT -g TCCAACTGAGAAGGCAATTGGAG -g 
TCTGTGTGCACTTCCTTTCACTT -g CAGGAAGCCGTACCCTACAC -g CTGGGGTCAAAAAGTGAAGGGTA -g 
AAGGGAATGCTACGTTTTCTGTG -g GGGCACGAGCATCAGCAG -g CAGAGTGATCCCGGTGGAGTTA -g 
CGTACGTCCCCCACTCCT -g AGCTATTGCATCTTACTGGTCCT -g GGATTTTAACTGTATGTTTCTTATCTCTCT 
-g CCTACACACTCAAGGTGCACTAC -g AGGGGACCTCCTTCATCCC -g TAGGCCACAGAACTGAAACATCT -g 
ACAAACTGTAGGTCTTTGTAAATGC -g TCAGTGACATTTTGTGTTAGACCT -g CCGCCGGCAACCTCAT -g 
TCCATTCATTCTCAATCCTTCTGC -g TAGCTTTAGCTTTGCTGGGTTTC -g GGCTTGTGGGAGGTCTTGTT -g 
ACCCCTTTCACAGGGAATTATGG -g GCCATGCCAGGGAGCC -g ATACAACACAGACTTCCACAGCA -g 
CATTGAAGGCACACTGGGTTGA -g CTTGCCTTCTTACAGCCCATTTG -g ATCATAGCTGAAATTTGTGAGCAG 
-g CCGTGAATGAGGGTATTGTCTCC -g TGCATCAGATTTTTCTGGGGCTT -g 
TGAAAAGATGACAGAACAAGATACCA -g AATGGCTTGTGTTTCTAGGCAGT -g CGCAAACTGTGGGGAGTTGAG 
-g CTGGACCAGGATTAGTGGGAAC -g GCTTCTGCTTTCCCGCTTCT -g GCTCTCCCCGGATCGT -g 
GCTGTGAGATCCCAATCACCC -g CCACTGGAAGGAGGAAGAGGG -g CTGCTGCTGCTGCCACT -g 
GGAGGATCAGAGGCCCCTTC -g GCTGCGTAGAGTGCTGAATATCT -g TCTCACCTTTTCTCACCAAGATGAA -g 
AACGTTCACAAAATCCGTTGTGC -g TCCTGAAAGAAGTAGAGTACCGA -g 
AAAAGCTGTTTTAACTTTTCTTTCCTTT -g TGGTGTGTTAATAGTCAACCCCTTA -g 
AATCAAAGTCTGCCCTCGCTCA -g CCCCACTGTGATGTAATGGAGG -g AACCCTTTCCTGTTCCTATTGCT -g 
AGCTGATGCTAATAACCAGACAA -g TGCTGGAATTAAATCACCTTGGC -g GGACAAGAAGGACTGGAAGGGA -g 
TCCCCAAACCTCAAGTTCATTCT -g CAGGCCAGGGCACCCA -g TGTTCCGCCTGCCAGAATTAAAA -g 
GTATACGTGTGAGCCTTTCCCTG -g GCGACGCGATGAGGGTATC -g TATCCAGATCCCAAGGCATCAAC -g 
TTTTAAAGAAGACTGTTACTTTTTCATTGATT -g TTACTTGGTCTGGTGCTGACATT -g 
CCAGGATCACCAACACTTTCTCT -g CCCACTATTAACTCCAAGCAGCA -g 
TTGTAAGAACTCTCTTTTTAATTCACGTA -g GCCTACAATGAAGGAGAACGACA -g 
GTGTGAAACCCACAACCAAACAG -g GGAAGCACGAGGAGGGTATC -g CGCTCGATGGTGGACGTG -g 
GTGGAGCAAACAATGACGATGAC -g AAATGACAACGGTTTGGAGGGAC -g TTATAGTCAGTTCCCCTCCCCAC 
-g GTAGACTAGGGCATCCTTGCTTC -g ACCTCACTAGCTACCCACTGTAA -g CATCTCCCAGGGAAGCAACG 
-g CTGCTTTCCCTCATTGCCTTTTC -g CTGACCCGGGAACCTTCTCA -g TGAAAGCATGAAGAAGTATGCAGAG 
-g CCTCCCGTCTGTGGTTGG -g TGTCTGTTCATTCTGACCTGTGA -g CTCCTTCTCCCTTCCTCCCTG -g 
GCTGTGTTAGGAAGTCAACCACT -g GGGACTCATGGTCTTGAACCG -g TACACTTGACCACAGACTCACAG -g 
AGTCAGCAAATACAGGAAACAATGA -g TTGGTTGTCAAGTTTCATTTGGT -g 
ACCCCAAAAGGATTTTATCTTGTTGTAT -g GATGGCACCAATAACCCTTTTGC -g 
TGTCCTCTGTCCTTACCATCCTT -g GCCAATCTCTCACTCACCTTTGG -g CATTGAGGTGGGCAATGTAGGTG 
-g GCAAAACCAAACCACAAGACAGA -g GCAAGACTTTACCAGGGACCAC -g CTGTTTGTGTGCCGGAAGGATG 
-g GATGGATTCAGAGACGGCCAAG -g CTGCTCACCTTGCTCTAATCCAG -g CTTTGGGGACCTGGAACGG -g 
CCTTTGCTTTTTCTTAATGTTTACTTGAAT -g GCCCAGTGAAGAGAATGAGGAAG -g 
ACTCTCTTCAGTAGTCCCCTTTG -g GTGGCCTACGCTACTATTACGAC -g GGTTGAGCGACAGCCATGTAT -g 
ATGAACTGTGGTTTCCAGTCCAA -g GCTTTTTTGAAGGACAGTCAGCA -g CTGCGCGCTGGACACAT -g 
CTCCCTCTCTGTGAGTTCTCCTC -g CCCCTGGCCCACCACA -g TCTATTCCCCACTCTCCCCAG -g 
CCAGCCCATTCATCTGCTTCTTA -g CTGCCTCCTACCCCCTCTTC -g GCTTCTCCTTCCAGTCCCAAAAG -g 
CTGGCTCGATCCGTGTAGTTTTC -g GGCTGTTCATACTGAGGATGGTG -g GGTGCATTCTTATTCTGGGGTGA 
-g AGTAATCTAGCTGGAGATCATTTCTTAAT -g AATGTTGGCGATTTCCCTGTTTG -g 
TTCTCTACTGTCCAGGTGTGTGA -g GAGACAAAAGAAGAGCTCCAGGT -g GTCCTCTTCCTACCTCCTCCTTT 
-g CCCTCGGAGTGTGCCTTG -g GGAGGCGCTTCTTGAGGAGT -g CCGGACCTCAGTGGCTTT -g 
TGACCTTGTTATCTCTTTAAGCCGAA -g ATGTTTCTGTCGATGAGGGCTTT -g ATGCAGAGTGTGGAGGGATG 
-g TCCTGCCTTACTTTTTACTGCCTT -g GGAGGCTGGTGCTGAGG -g CCAGGCTGCCCAAGCG -g 
TGGCCTGATGATACTGAAGTTTATGG -g TCCGCTGTCTGTCTTGTAACATC -g 
TGCCAAGAAGGAGGAATGGAAAA -g CCTGCTGGATCTTCATCTGATCTC -g GGTAGTATGGCGGTGGGTACAT 
-g AGTTTGGAAGATGAAAGCAATGTC -g ACTAACTTCTCCCTTGATTTTCACTT -g 
ACCTCTCCCACAGATTAATTCTACA -g TGCCTTTGATCATTCACCAAACAA -g 
GCAGAGAACAGAATCCGTGAGTT -g TCAGCTGGAGAAAAGTAACATGG -g TTTTAGAAGCCTGCTTATGTGGA 
-g CCTCTTTGCCCCTAGCATGG -g ACTGAACATGAGAAAATACCGAATGG -g AGCCCTGTGTTTGCAACTGG 
-g TTTCTCTCTTCCTTAGGGCTGTC -g GCCCCTCAGCATGTTGTGAA -g GACCAGGGTGGAAATGGGAG -g 
GGGCTAAACAACAGAGATTTTCCTC -g ACTGGATTCTTACTCATTATCCCCAC -g 
CAAAGCTGCTTGAAACTTCTCCC -g GGACACTGTGTTTCCACTTCCTT -g TGCCTGGGAACTTTGAATGA -g 
AGGAGGAGGATCAAAAATTGATAGGT -g TACAGTACCAATAGGCCAACAGC -g 
TAGAAGACTGCAATGCAACAGGT -g ACATTTCCTTCTCCAGAACTTTCGT -g CGGCCATGTTGTTGGGCAT -g 
AGCATGAAAAAGGTGCTCTCACT -g TCTGATCCTTGCTCTGGTAGGTC -g GAGATGGCGGTCATTTCCTATCC 
-g GTGGGCACCCCTGCTG -g GGACCGGAAACAGCCCAG -g CTGGTCCGGAAGCTTTCTAATGG -g 
TTTAGAGGCAACAACTCCCTGTG -g CAGAGTGAGGCATTTCTGTCTCC -g AGCCAGAGTCTAGGTGGCTT -g 
AATAACTGCCAGCATGTTTTGAT -g GTATCTTCCAGTTGGGTGTCC -g GAGTTCTCAGAGCCCAGCTTCAT -g 
TCCTTTCTGATTCTGTGGTGCTG -g TGCATGCTCTGGGAGGAAAA -g CCTACTGTGTGAGTCCCTCCTAA -g 
ACAGAGTTTATTTACAGTTTCTTATGATGGA -g AGAGACTGTTTGGGGGTATTTACT -g 
GCCAGGACTCTTTTGAAAGCATA -g CCTGCAGTGCAAGGCAAAAC -g CCCGTTTTTGAGTAAACCTGAAGC -g 
CCGCTTCCACCATACCTACCT -g CTGGTCCCACCTCTGATGTCC -g GTGCATCTATGTTCGTCTGTGTG -g 
GTGTGGGTGGCTGCACTT -g CAGCAGGCAGCCCCAG -g CACCGCTTACCTCCCTTCTG -g 
AGCTCCTCAGCAGCCTCAA -g CAAAGGGTCTGTGGGTGGG -g CCTGTAAGTACGGGGACAAGTG -g 
GCTTGCATGTACCAAGAATGTCTAT -g TAGAAGCACCTGAAGGAAACCAC -g GCTGCCACTTCATCCACAAC -g 
GAGGGAAAGCTCCTCCAAGTG -g TCCTTGCCGTTTGTGGAAGAAAT -g CCCACGGCCTCCAGTATTAAC -g 
TGCTTAAGGTCTCTGTTCAGCAT -g GTGACAGAGCTCCAAGAGGTG -g GTGGAACACATGGTTGCTGAC -g 
TGGAACAGTACACACCATGTCTT -g TGTTCCCTTTGCTTATATATCTCTTTAGC -g 
CCTGACTCCAGTGTCCCCAT -g ATTCCCAGAGGAAAAGTCAGAGG -g TTGTCTCGAGCTGGAACTGAC -g 
CCAGTTTGACCGCTCCTTCC -g ACATTGTTTCGGAACCTGGAAAG -g TCGAAAAGATATAATCCGCATCCT -g 
CGACAGAGTGGGTTTCATGGAG -g CCTGCCACATTGAGTCAACTAC -g GTGAAGAGGGCTGGTATGGC -g 
GCTGTCCGGCCCACTC -g TGACTTCCACATTTTCTAGATTTGC -g CAAGAAAATGACCGCAACAGGTA -g 
AACAGTATAAATTGCATCTCTTGTTCA -g TCTCAGTAGCAATGAGCTAAGTTT -g 
GATGCCTGCTAGCCATAATCAGT -g TCATCTTGACACAACAGGCTACA -g CCACAGGAAGGTTATCTGATGGT 
-g TTGCATTTGTTTTTCATTTCTCTTCCC -g CTTAACACAGGTGGAGTGGTTCA -g 
TTCTTTCCCAGAATTCCTCTTGTT -g ATTTGCCTCTTTCAGCTCTTCCA -g GGTGGGGCCGAATGAGG -g 
AATCCAATGCATGTAGCTGTGGT -g TGGATAGACTATGGGCAACCACT -g 
GTTGTCCAGTTTAAAGATAATATTTTGGTG -g AGTTCTTTTTGGCGTTCACTGTT -g 
TGTTGAAATGGTGAAACGGCAAA -g TCTCAGTGGAATCAAGTCCAACA -g TGGGGCTTCAGAACAGATAAACC 
-g GCAACTTGGTCCATGGCAGA -g ATTGACCTCTTTGTTCCTTCCCT -g TTCTCGTTGCCTTGTTTTCTTGG 
-g GCATTACCTTATACAATGATGTGCT -g TCTGTGACGTGACTTTGATCGT -g 
CATCCTCTGGCTCCTACCAGTT -g GTGACAAGAATAGTGCCATGGTG -g GCTGTGTCCACCGTGAAGTTAAT -g 
CCCAGTTTTCAGGACTGCATTTG -g CAACCAAGTGAATCCCAACCCAA -g CCATCTGCATTGGACCCAACAT -g 
ACGCTTGGAAGGCTCTATTATGTC -g CTGAGCTCTGTAAACAGTTCCGT -g 
TCAATTAATATAAAAGGAGGGTTTGGCTT -g TGACAGTTGTGGTTTATCATTCTCT -g 
GGTGATGTGGGCTGTGAATGAAT -g TCTTTACCTATCCAGATTTGCTTCT -g 
AGCCATGTGTACTTTTGATGAGGA -g TTTCCAGCTCTGTGGCAAGAAT -g 
GGAAAGAAGAAAATATGTACTAGGGCTATG -g GTCTCTGGGCAGTACTCACG -g 
TTGGGTTACAACATTACGCGTTG -g ACTGAGACTTCCCACTCTAGGTC -g GACTGCTTCTCAGTCCCAAACTC 
-g AAAAATGCCAGATGAGAACCACA -g TGCTGTGGAGAATTTAAGAGGG -g CACCCATGGCACCCACC -g 
TACATCCATGAGAGAAAGCTGGG -g ACCTGTTAATGTTTCATCTCTTCTCGT -g 
GACTCACATGGTGAATGCAATGG -g CAACACCCTGGGCAAAATCTC -g TTCCCCAACATTGTGCCTTTTTG -g 
GTTGGGATGCTGACACTCCAT -g TGATTCTGTTTCCTGGGTTCCAAT -g AGACTCAATCCAGTGTAGATGCC -g 
GTCCTTAGGCACAGGGAATTCAG -g ACATCAGCCACCCAGTGTTTTTA -g ATTGAAGCATCTCATTGTCCAGT 
-g ATCAAGCTGGCCTGGATTCAAAA -g TCATGTGTTTTGTAGTGCCTGGT -g 
TCAGTACCTAGGATGGGCTTCTC -g ACAGGAAACATTATCTGTACATTGACT -g GGCATAGGGCTGGTAATGCTT 
-g TTTTCTGTGATTATTAGCTTCTTTCAGT -g ATGTTCAACTGCTGCTTGACTTC -g 
AGGCAACTCCCATTCTAGAGGAA -g ACTGGCTGCTACATTGTGATTGA -g CAGGTGTCCCAGTTCCCAC -g 
GAGAAGGAAGAAGCCCTGCTG -g AATTTCTTTGTAGGTCCCGTTCA -g GCATCAGGAGAGTATCTCACAGC -g 
AGCTTGAAAATAAAGGCAACAGG -g ACTCTATAAATGACACACACAGTCA -g CCATTGGACATCAGATAGGTCGT 
-g CGCTCCAACCTGTCTTCTCTC -g GGAAGGGATATTCAGGAGAGCAG -g 
TGACCAAGAGATGCTGTTTAAAGAAA -g CTGCCAAGGAACCATGACAAGAA -g 
GGGGCAAGAATGTGAAGTCAGTA -g CAAGGTCAAGGCAGTAGGAGAGG -g GGAATCCCCTTACCAGACAGGAC 
-g GTACTGGAATATCATATCTTTATATCCTTTATTGA -g GTTTTCTCACCAGGCCCACT -g 
AATACCCCTGATCTTCAAACTCG -g GACATCCTGCGAGACTACAAAGT -g CTCCTCCATCTTCATGCTCCAAA 
-g AGTTTTACCTGGAGGAGGTGATG -g TCATTGTCTAGGTAAGGAGGAGGA -g 
TTACCTTTTGGACATGGCTTGA -g ATCATAGAAGGTTTGCCGCTTCC -g CAGCAAAACTACACTTCAAATGTTCA 
-g TGTGTCTTCGGGATGCTTGATTT -g TAACTATGCTTTTTTCCCCCCAA -g 
TCATCCATGAGACAGACCTGTTA -g TCCAGTTCCTCCAGTTAAATGCT -g CACAAGGAATGTGTACAGGAACC 
-g TGCAAGCCTCAAACACTAAGGAT -g TCAAAACCAGAGAGATTTCAAGACA -g 
CTGATGACATTTAATTTCCATTTCTGAGT -g AGGTTCCATGGGATTCTGGG -g 
CTCTTTCCAGGCTACTAATAAAATTGCC -g TGGTTTGTTTTCATTTTTTAACTTTATGGT -g 
GTACCAGGTTACCGCTGGACTTT -g TGGGTCACACTGTCTTTTAACCT -g ACACATCCTTGGACTTGGAAGAT 
-g CCATAGCCATTTTCAGCCCTACT -g GGGTGAATAAAGGACCTCTTGCC -g 
AGTGAGATCGCAGAGTATTTGCC -g TGCAGGAGGTGACCCA -g TGGGATCAGGCAGCTTATTTGTT -g 
CTCACCGTCAAACAGCCCAT -g CACGATGTCATTCAAAGGCGATG -g TGCAGTCAATGCTCCAACTTACA -g 
CTTGCTCTCTCTCCAGAACTCTT -g GGGCCGCTTGAATATGACTGTT -g CCTCACTTGGAAAAGAGCTCCA -g 
CAGTGTTTCTGTCCGTAGACCC -g TGTGCTGTTGACCAGTGTTTGAT -g ATGCACAAGCTAACCTCAGAACA -g 
CTATAGCAAGCCAGGACTCCAC -g TGATCCCCACTAGCTATAAAGGC -g GCATTTGAGGAAAGAGCTGTGTG -g 
AGCTCACCTCTAGTGAACCCAAT -g ACTTTTTCTGTTTCTAATGTAAGCATTTTC -g 
AGCCAGACAGGGTAATCTTCCTA -g AAGTTGCGTGTGTCTGTTTCCTT -g ACTGACGAAGAAGCCGAGGTA -g 
TCTCCAAGTAACTGTGGGCAAAA -g GAGCGATGGAAACAGAGCAGAA -g GAAAGCAGGCAGTTTCCTTTCTG -g 
GCAATTCTGGACTGGAAAATGCC -g GGAAGGGATGCTACGATATGGC -g CCTGACCATGGAGTGCCCTA -g 
TTTAGCAGCATCTGAATGCACAA -g TGTGTTCAGATTTCATGTGCAGT -g AGAACAGAACAAGAACTGTAAACCT 
-g AGCCCTAGAAATGAGTTCCTGAC -g AAGCTATCTTTTACTTTCTGAATAATGTTTG -g 
ACAAAAAAGACTTGGGGATTGCAT

Original comment by d.vanh...@qmul.ac.uk on 9 Mar 2012 at 10:35

GoogleCodeExporter commented 9 years ago
Oh, I hadn’t gotten to that mail yet … 

That’s a lot of adapters, wow. Never imagined anyone would ever specify so 
many adapters. Regarding your trimmed read: One of your adapters matches the 
read quite well:

...CCTCACAGCTGCCTGCATGGAGCTCACCTCAGCTTAGTGTGTTCCAGCCGGAGCTCCAGTTTCTTAGACACCATGTC
adapter:                                              AGCTCCAGTTTCTTAGACACCAT

You are using the -g option, which achieves that the adapter and everything 
*preceding* it will be removed from the read. Since in this case the adapter 
occurs almost at the end of the read, only the three bases GTC remain.

I’m not sure whether the -g option is actually so useful, so I welcome 
feedback: What would be the behavior that you expect?

Original comment by marcel.m...@tu-dortmund.de on 9 Mar 2012 at 10:48

GoogleCodeExporter commented 9 years ago
OK,

I had hoped that -g would only trim from the absolute 5' end of the read (i.e. 
for a 20bp adapter, see if it aligns to say the 5' ~25bp allowing for the odd 
indel error etc).
I see what it is doing now.
I dont want it to trim a match found in the middle of the read !!

We have a lot of PCR amplicons pooled, hence so many adapters - but your script 
copes fine with this number.
Some PCR amplicons are overlapping. That's why in the example below the adapter 
from one amplicon is in the middle of another amplicon.

Is it possible to make a strict 5' adapter matching and trimming option ?

thanks, david

Original comment by d.vanh...@qmul.ac.uk on 9 Mar 2012 at 10:55

GoogleCodeExporter commented 9 years ago
Ok, I understand. I’m quite busy with other non-cutadapt things, so I won’t 
be able to properly add such an option, test and document it. But you can 
easily change the behavior of the -g option yourself. Simply open the cutadapt 
file and look for this section:

# Constants for the find_best_alignment function.
# The function is called with SEQ1 as the adapter, SEQ2 as the read.
BACK = align.START_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ1
FRONT = align.START_WITHIN_SEQ2 | align.STOP_WITHIN_SEQ2 | 
align.START_WITHIN_SEQ1
ANYWHERE = align.SEMIGLOBAL

Then change the FRONT line as follows:
FRONT = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1

Without the START_WITHIN_SEQ2 term, the alignment cannot start within the read. 
Or more precisely: Any skipped base will count as an error.

I’ll leave this issue open until I’ve added a proper way to specify this on 
the command line.

Original comment by marcel.m...@tu-dortmund.de on 9 Mar 2012 at 11:33

GoogleCodeExporter commented 9 years ago
Hmm, having replaced that line in the cutadapt python script with:
FRONT = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1

same result (output below).....

@MISEQ:6:000000000-A0EW6:1:1:17223:1592 1:N:0:CGCTATCAGT
GTC
+
::>

Original comment by d.vanh...@qmul.ac.uk on 9 Mar 2012 at 3:57

GoogleCodeExporter commented 9 years ago
Ok, one more change is needed. Find these lines (approximately line 263):

    if pos >= 0:
      match = AdapterMatch(0, len(self.sequence), pos, pos + len(self.sequence), len(self.sequence), 0, self)

and change the "pos >= 0" to "False":

    if False:
      match = AdapterMatch(0, len(self.sequence), pos, pos + len(self.sequence), len(self.sequence), 0, self)

cutadapt will be a little bit slower, though.

I will probably not be able to reply until end of March.

Original comment by marcel.m...@tu-dortmund.de on 10 Mar 2012 at 9:55

GoogleCodeExporter commented 9 years ago
thanks, am using v1.0 and cant find those lines (there is something similar 
though):

        if pos >= 0:
            result = (0, len(adapter), pos, pos + len(adapter), len(adapter), 0)

i saw a reference to v1.1 but cant find a download.

david

Original comment by d.vanh...@qmul.ac.uk on 11 Mar 2012 at 9:05

GoogleCodeExporter commented 9 years ago
Hello, I’m back and have now looked into this again. I realize now that the 
above modifications aren’t sufficient to get the behavior you wanted. I have 
modified cutadapt to allow 'anchored' adapters with the -g parameter. Are you 
still interested? Cutadapt 1.1 will contain those changes, but isn’t 
released, yet. You can get the source code from the Subversion repository, but 
I’d also be happy to send you a .tar.gz package if you’re interested in 
testing the changes.

Original comment by marcel.m...@tu-dortmund.de on 17 Apr 2012 at 9:14

GoogleCodeExporter commented 9 years ago
Thanks Marcel,

Have in the interim found a workaround - so will wait to have a look at 
cutadapt 1.1 when the new version is formally released.

thanks, david

Original comment by d.vanh...@qmul.ac.uk on 18 Apr 2012 at 10:25

GoogleCodeExporter commented 9 years ago
Hi Marcel,
I am also having the same scenario..

So I would like to use Cutadapt1.1 with modified 'anchored' adapters with the 
-g parameter.
I just checkout the SVN repository, and still its shows verson 1.0
/home/sjohn/Install/cutadapt-read-only# ./build/scripts-2.7/cutadapt --version
1.0

So can you send me the modified version?

Regards,
Shibu

Original comment by shibujoh...@gmail.com on 3 May 2012 at 3:07

GoogleCodeExporter commented 9 years ago
I’ve replied to the above comment by mail (the answer is that the version 
number in SVN isn’t updated, but the feature is already in).

Original comment by marcel.m...@tu-dortmund.de on 8 May 2012 at 3:26