AfshinLab / BLR

MIT License
5 stars 0 forks source link

Rule map_sort sometimes fails for ema #73

Open pontushojer opened 12 months ago

pontushojer commented 12 months ago

I have been having intermittent issues with runs failing at the map_sort step. This seems to be due to a formatting error in the ema align output that results in samtools sort failing. The formatting error is usually related to the cigar, see my issue https://github.com/arshajii/ema/issues/39.

To solve this I have so far mapped manually with ema and the parsed the SAM output with this script found below. This simply skips any entries that are wrongly formatted. The I manually sort the output using samtools and name is initialmapping.bam to integrate it to the workflow again.

"""
Parse filter BAM/SAM
"""
import argparse
import pysam
from tqdm import tqdm

def main(args):
    in_mode = "rb" if args.input.endswith(".bam") else "r"
    out_mode = "wb" if args.output.endswith(".bam") else "w"
    with pysam.AlignmentFile(args.input, mode=in_mode) as infile:
        with pysam.AlignmentFile(args.output, mode=out_mode, template=infile) as outfile:
            progress = tqdm()
            parser = infile.fetch(until_eof=True)
            while True:
                try:
                    read = next(parser)
                except StopIteration:
                    break
                except OSError:
                    continue
                progress.update()
                outfile.write(read)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("input", help="Input SAM/BAM")
    parser.add_argument("output", help="Output SAM/BAM")
    main(parser.parse_args())

Issue copied from https://github.com/FrickTobias/BLR/issues/235