alexstaj / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

trimm adapter only from beginning does not work correctly #69

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have this test file:

@head1
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGTATGCCGTCTTCTGCTT
GTTTT
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA
@head2
CCCCGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGTATGCCGTCTTCT
GCTTG
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA

I run cutadapt:

cutadapt -g ^GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTA 34a_R1.fastq >out.fastq

And I expect that only my first sequence will be treamed as I anchored adaptor 
with '^'. Instead, both are trimmed down to:

@head1
TCTCGTATGCCGTCTTCTGCTTGTATGCCGTCTTCTGCTTGTTTT
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@head2
TCTCGTATGCCGTCTTCTGCTTGTATGCCGTCTTCTGCTTG
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Any help will be appreciated.

Original issue reported on code.google.com by michalov...@gmail.com on 18 Oct 2013 at 9:00

GoogleCodeExporter commented 9 years ago
Hi, sorry for the late reply. Your adapter has a length of 40 and you are using 
the default error rate of 0.1, that is, four errors are allowed. Even when you 
use anchoring, those errors are still allowed and so cutadapt interprets the 
first four Cs in the read as being insertions. I suggest you reduce the error 
rate for now. Perhaps it also makes sense for me to add an option of 
disallowing insertions and deletions altogether, what do you think?

Original comment by marcel.m...@tu-dortmund.de on 8 Nov 2013 at 3:49

GoogleCodeExporter commented 9 years ago
The current development version of cutadapt has a --no-indels option. It 
currently works only for anchored 5' adapters. This should fix this issue.

Original comment by marcel.m...@tu-dortmund.de on 6 Feb 2014 at 10:05