fulcrumgenomics / prymer

Python Primer Design Library
https://prymer.readthedocs.io/en/latest/
MIT License
8 stars 0 forks source link

Handling unmapped results in `BwaAlnInteractive._to_result` #68

Open ameynert opened 5 days ago

ameynert commented 5 days ago

bwa aln sometimes returns a read that is unmapped but does have hits (both primary and XA tag). Suggest that we ignore this case:

        num_hits: int = int(rec.get_tag("HN")) if rec.has_tag("HN") else 0
        if num_hits > self.max_hits:
            return BwaResult(query=query, hit_count=num_hits, hits=[])
        else:
            hits = self.to_hits(rec=rec)
            hit_count = num_hits if len(hits) == 0 else len(hits)
            return BwaResult(query=query, hit_count=hit_count, hits=hits)
ameynert commented 5 days ago

The three valid scenarios for hits being returned from bwa aln are:

  1. Read is unmapped. HN = 0, there should be no XA tags returned, and thus hits = []
  2. Read is mapped, potentially multiply-mapped. 0 < HN <= max_hits. HN = n, where n is the number of hits, and len(hits) == n
  3. Read is multiply-mapped, HN > max_hits. HN = n, where n is the number of hits, and hits = [] because although the primary hit is returned, there are no XA tags representing the alternate reads, so we return none of them.