Closed GoogleCodeExporter closed 9 years ago
Could you perhaps send me or attach to this issue an example of a FASTA and a
QUAL file as produced by the Roche software?
Original comment by marcel.m...@tu-dortmund.de
on 8 Feb 2011 at 11:19
For a short example (10 reads), see:
http://biopython.open-bio.org/SRC/biopython/Tests/Roche/
or
https://github.com/biopython/biopython/tree/master/Tests/Roche
Original SFF file:
E3MFGYR02_random_10_reads.sff
The trimmed reads (what people normally work with):
E3MFGYR02_random_10_reads.fasta
E3MFGYR02_random_10_reads.qual
The untrimmed reads (with 454 adapter and poor quality bases):
E3MFGYR02_random_10_reads_no_trim.fasta
E3MFGYR02_random_10_reads_no_trim.qual
Original comment by p.j.a.c...@googlemail.com
on 8 Feb 2011 at 12:07
Thanks, I'll look into this. May take a few days.
Original comment by marcel.m...@tu-dortmund.de
on 8 Feb 2011 at 12:44
I have added support for non-colorspace .FASTA+.QUAL files to cutadapt. It
seems to work although the output is different from BioPython's trimmed
sequences. This seems to be due to a different low-quality trimming algorithm.
Remember to use the -b parameter to search for adapters that are potentially in
the beginning of reads. If you don't, then all reads in which an adapter was
found will be empty after trimming.
Can you get cutadapt from Subversion in order to test this? Otherwise I'll just
release a new version.
Original comment by marcel.m...@tu-dortmund.de
on 14 Feb 2011 at 2:58
The trimmed examples in the Biopython tests are just applying the trimming
information in the SFF file itself (just like the Roche off instrument
application does it).
Original comment by p.j.a.c...@googlemail.com
on 14 Feb 2011 at 4:01
Thanks, that also explains where the lowercase nucleotides in the untrimmed
files come from.
Original comment by marcel.m...@tu-dortmund.de
on 15 Feb 2011 at 6:46
Original issue reported on code.google.com by
p.j.a.c...@googlemail.com
on 8 Feb 2011 at 10:35