alexstaj / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

make Ns align to any base #23

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It would be helpful to have an option that would allow Ns to align to any base 
(i.e. consider the alignment between GGGGG and GGNGG to have 0 errors instead 
of 1 error). 

This is my test file (the adapter sequence will be GGGGGGG):

$ cat test.fa
>perfect
TTTGGGGGGG
>withN
TTTGGNGGGG
>1mism
TTTGGGGCGG

Currently the N is treated like any other mismatch: if I set -e to 0, the withN 
sequence won't align to the adapter; in order to make it align I have to set -e 
0.15 or higher, same as I would for a sequence with a non-ambiguous incorrect 
base.

$ cutadapt -e 0 -a GGGGGGG test.fa
>perfect
TTT
>withN
TTTGGNGGGG
>1mism
TTTGGGGCGG

$ cutadapt -e 0.15 -a GGGGGGG test.fa
>perfect
TTT
>withN
TTT
>1mism
TTT

I would like to have an option to make the TTTGGNGGGG version be trimmed even 
with -e 0. 

Version/system:

$ cutadapt --version
0.9.4
$ uname -a
Linux bleen 2.6.32-31-generic #61-Ubuntu SMP Fri Apr 8 18:25:51 UTC 2011 x86_64 
GNU/Linux

P.S. cutadapt is a very useful program, thank you!

Original issue reported on code.google.com by pat...@gmail.com on 10 Jun 2011 at 11:09

GoogleCodeExporter commented 9 years ago
Hi, thanks for your comment. I think your idea makes sense. I'm not sure when 
I'll get around to implementing it, however.

Original comment by marcel.m...@tu-dortmund.de on 14 Jun 2011 at 7:02

GoogleCodeExporter commented 9 years ago
I have attached a patch that allows both Ns in the read and in the adapter.  It 
can optionally report those that match 'N's in the adapter as well, as per 
http://nar.oxfordjournals.org/content/early/2011/04/13/nar.gkr217.abstract

Original comment by cas...@gmail.com on 1 Oct 2011 at 7:39

Attachments:

GoogleCodeExporter commented 9 years ago
I have updated the patch to r201.  Changed references to degenerate to wildcard 
and added tests.

Provided the diff as well (does git formatted patch apply ok?).

Original comment by cas...@gmail.com on 14 Oct 2011 at 12:13

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks a lot, this works now (committed as revision 202). To those watching 
this bug: Use the new --match-read-wildcards parameter to make the above test 
case work.

Original comment by marcel.m...@tu-dortmund.de on 14 Oct 2011 at 2:35

GoogleCodeExporter commented 9 years ago

Original comment by marcel.m...@tu-dortmund.de on 14 Oct 2011 at 2:36