Closed superbobry closed 6 years ago
convert_fasta
happens only once at index creation. For as_bam
, I think you'd want to benchmark and see where in handle_reads()
is the bottleneck. It's either in that function and/or the methods on class Bam
Right, I meant convert_reads
. I'll do the benchmarks and update the issue.
I'd prefer to make the cython dep optional as people seem to have enough trouble with vanilla python modules. There might be some low-hanging fruit that would improve the performance.
Have you noticed that bwa mem is generating alignments faster than the python in bwa-meth can consume them?
I've recently used bwa-mem
for aligning unpaired reads and noticed a slowdown on the Python a side. I'm thinking that treating paired and unpaired reads separately in convert_reads
might help, because all of the operations below are useless for the unpaired case:
# vvv --- we can drop the loop as well.
for read_i, (name, seq, _, qual) in enumerate(pair):
if name is None: continue
name = name.rstrip("\r\n").split(" ")[0] # except for this line.
if name.endswith(("_R1", "_R2")):
name = name[:-3]
elif name.endswith(("/1", "/2")):
name = name[:-2]
What do you think?
If you benchmark and see a significant improvement from that change, I'd accept. Maybe this:
for pair in izip(q1_iter, q2_iter)
could be
qiter = izip(q1_iter, q2_iter) if fq2 != "NA" else q1_iter
then loop over that and it could remove some of the internal checks.
I'd guess that the slowdown might be in processing the bam rather than converting the FASTQ
These two seem to be the bottlenecks during alignment. Have you considered using Cython or C/C++ to speed things up?