Closed lweasel closed 8 years ago
Currently the filtering code checks for a minimum 'overhang' without restriction. If there is any quantity of 'N' (skipped regions) present in the CIGAR string it will check that all 'M' (match) segments of the CIGAR string are above threshold (currently 5bp). Is this what you are thinking of? see: check_cigar() and check_min_match() in filter_sample_reads.py
Ah right, I'm still looking at the code where I branched off for the first paper. I see that you've already done this!
Our code currently checks that if an alignment spans an exon junction, there is enough "overhang" in each exon. But I think we need to also consider cases where a read (particular when they are longer) has parts mapping to 3 exons or more.