Closed aogier closed 7 years ago
Hi, I would strongly suggest that you don't use the --reduce option, as there's something not quite right with how it works. It was experimental when I put it in and I'm not really supporting it. In fact, I would encourage you to try the UnifiedConsensusCaller.py that I recently posted to the GitHub. It is a single program that takes an unaligned bam file, which I typically create by Picard FastqToSam, and outputs final duplex consensuses (and single-strand consensuses, if desired) in paired gzipped fastq files, as well as the relevant tagstats plots. It saves a huge amount of time and doesn't incur any biases that could result from performing an alignment first.
Hi, many thanks for your kind feedback which I've forwarded to our scientist !
Have a nice day, regards
Hi, I've found a long running tag_to_header user process on bioinfo cluster I manage. After a little investigation it turned out that beyond a certain sequences amount, when using --tagstats and --reduce, script suddenly become heavily cpu-bound. I think I have found the culprit and I'll now like to share that fix with you.
Some rough measurements I've done:
we are working on a ~143M sequences input dataset so figure out :)
Python wiki page explaining in behaviour: https://wiki.python.org/moin/TimeComplexity
HTH, regards