biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

implement the primary alignment from STAR --outSAMprimaryFlag #64

Closed hxin closed 6 years ago

hxin commented 6 years ago

STAR provides a --outSAMprimaryFlag which by default identify a primary (pair) hit(s) within a filter(one read identifier mapped to multiple location on the genome). This could be used to simplify the current logic for comparing across filters. https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Currently, we are looking for the best mis_match in one filter and used it to compare among different filters. If there is a draw, we then find the best cigar string and use it to compare among different filters. The logic for finding the best cigar string is complicated and the read that has best mis_match in one filter might be a different read that has the best cigar string.

Using the STAR primary hits will avoid this complication but still needs to be tested.

http://pysam.readthedocs.io/en/latest/api.html We are currently using pysam for extracting information from the hits. It provides a 'is_secondary' methods to identify primary alignment.

hxin commented 6 years ago

also fix #56 #57 #55