Open MikeDacre opened 8 years ago
It turns out that you can use the mate
method of Samfile
to get the mate of any read:
sfile = pysam.Samfile('LD0001_star_nodups.bam')
read = next(sfile)
mate = sfile.mate(mate)
If read is unpaired, it throws a ValueError
A few things:
for read in pysam.fetch(until_eof=True):
. This would allow the script to operate as it currently does, but on BAM files, and still allow access to all of the 'mapped segment' functions.Am I wrong that the only step that really cares about which way the input file is sorted is the splitting step? If so, it could be enough of a speed savings to use the indexed file to avoid the need for multiplexing at all.
Alternatively, it seems like using the indexed bam file doesn't actually lead to any speedups—and in fact slows it down by something like 60 fold (see commit 659c4736fe ). Maybe I'm doing something really bad with the implementation, but based on the profiling, it's either gonna be really easy to fix or not at all.
Per @carloartieri:
Possible issue: