I have been having intermittent issues with runs failing at the map_sort step. This seems to be due to a formatting error in the ema align output that results in samtools sort failing. The formatting error is usually related to the cigar, see my issue https://github.com/arshajii/ema/issues/39.
To solve this I have so far mapped manually with ema and the parsed the SAM output with this script found below. This simply skips any entries that are wrongly formatted. The I manually sort the output using samtools and name is initialmapping.bam to integrate it to the workflow again.
"""
Parse filter BAM/SAM
"""
import argparse
import pysam
from tqdm import tqdm
def main(args):
in_mode = "rb" if args.input.endswith(".bam") else "r"
out_mode = "wb" if args.output.endswith(".bam") else "w"
with pysam.AlignmentFile(args.input, mode=in_mode) as infile:
with pysam.AlignmentFile(args.output, mode=out_mode, template=infile) as outfile:
progress = tqdm()
parser = infile.fetch(until_eof=True)
while True:
try:
read = next(parser)
except StopIteration:
break
except OSError:
continue
progress.update()
outfile.write(read)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("input", help="Input SAM/BAM")
parser.add_argument("output", help="Output SAM/BAM")
main(parser.parse_args())
I have been having intermittent issues with runs failing at the map_sort step. This seems to be due to a formatting error in the
ema align
output that results insamtools sort
failing. The formatting error is usually related to the cigar, see my issue https://github.com/arshajii/ema/issues/39.To solve this I have so far mapped manually with ema and the parsed the SAM output with this script found below. This simply skips any entries that are wrongly formatted. The I manually sort the output using samtools and name is
initialmapping.bam
to integrate it to the workflow again.