alexstaj / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

Illumina 1.8+ header support #90

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It seems that Illumina's new header style is using tabs to separate the header 
from the pair ID, as in:
@M00141:217:000000000-A55DC:1:1118:10458:19998  2:N:0:1

In seqio.PairedSequenceReader, it attempts to deal with this by using space as 
a delimiter. If this is changed to whitespace, then all is well with the new 
format.

So...
changing:
                        name1 = r1.name.split(' ')[0]
            name2 = r2.name.split(' ')[0]
to:
                        name1 = r1.name.split()[0]
            name2 = r2.name.split()[0]
seems to fix the issue.

Original issue reported on code.google.com by dbern...@soe.ucsc.edu on 24 Nov 2014 at 10:49

GoogleCodeExporter commented 9 years ago
All the Casava 1.8+ files I have access to use a space, not a tab. Is it 
possible that the file you have was not the direct output of Illumina’s 
pipeline?

In any case, I don’t see a problem with allowing both spaces and tabs as it 
will make cutadapt more robust, so I’ve made the change you suggested. I’ll 
release version 1.7 soon, which will have the fix. Thanks!

Original comment by marcel.m...@tu-dortmund.de on 25 Nov 2014 at 9:04