ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
287 stars 40 forks source link

added MAPQ filter #13

Closed matthdsm closed 6 years ago

matthdsm commented 6 years ago

Hi Pascal, Charlotte

As proposed, I've tried adding a very basic MAPQ filter myself. Please let me know if I missed a step somewhere.

Thanks M

pcostanza commented 6 years ago

We moved the mapping quality filter further to the front of the list of filters, to make sure that simpler filters are executed before more complex filters, for efficiency reasons.

The filter itself didn't have the correct return type. Admittedly, this part of elPrep is slightly trickier to understand, but if you check the source code now, it should be clearer what is needed: A filter pipeline always first needs to call a function that accepts a SAM header and returns the actual SAM alignment filter, even if the header is ignored and not touched.

We also added two optimisations: If you call FilterMappingQuality with 0, then we don't add a filter at all, and if you call FilterMappingQuality with a value greater than 255, then we return a filter that removes all alignments. (You already checked for the first case in the command line program, but we prefer to also check in the filters themselves, because they may be used outside of the command line program as well.)

Thanks a lot for the commit, this is certainly a very useful filter. :-)

Pascal