biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Consider case when alignments span >2 exons #29

Closed lweasel closed 8 years ago

lweasel commented 8 years ago

Our code currently checks that if an alignment spans an exon junction, there is enough "overhang" in each exon. But I think we need to also consider cases where a read (particular when they are longer) has parts mapping to 3 exons or more.

s-heron commented 8 years ago

Currently the filtering code checks for a minimum 'overhang' without restriction. If there is any quantity of 'N' (skipped regions) present in the CIGAR string it will check that all 'M' (match) segments of the CIGAR string are above threshold (currently 5bp). Is this what you are thinking of? see: check_cigar() and check_min_match() in filter_sample_reads.py

lweasel commented 8 years ago

Ah right, I'm still looking at the code where I branched off for the first paper. I see that you've already done this!