Closed GoogleCodeExporter closed 9 years ago
Python’s a nice language, you should try it :)
Multithreading is on my to-do list and there are also some algorithmic
improvements possible that will speed up the single-thread case. I have to say
though that my time to work on this is quite limited, right now.
Regarding duplicate reads: The problem is to detect whether a read is a
duplicate of another. Since the first read could be a duplicate of the last
read, all reads and results would need to be kept in memory. Keeping only the
last 1000 or so reads in memory would perhaps be an option and may already help
a bit, but I think there are other potential improvements that would help even
more. (Parsing the input, for example.)
Original comment by marcel.m...@tu-dortmund.de
on 10 May 2012 at 6:54
I have recently done some experiments, trying to get multithreading into
cutadapt, but there's a really large overhead that comes from the communication
between the threads. In the end, the multithreaded version was actually slower
than the non-threaded one. One more idea may be to provide a wrapper script
that splits the input FASTQ file and then runs multiple cutadapt instances, but
I guess something like this exists already. For now, I've decided not to work
on this further.
Original comment by marcel.m...@tu-dortmund.de
on 19 Jun 2014 at 2:10
Original issue reported on code.google.com by
jwad...@gmail.com
on 8 May 2012 at 9:36