cerebis / bin3C

Extract metagenome-assembled genomes (MAGs) from metagenomic data using Hi-C.
GNU Affero General Public License v3.0
23 stars 7 forks source link

Method assumes a cigar string will exist #23

Closed cerebis closed 4 years ago

cerebis commented 4 years ago

The logic within _strong_match() should be reordered so that a cigar string is not assumed to always exist. For records which have an unavaiilable cigarstring (ie *), pysam will return None when fetching the cigarstring. This type of entry exists for unmapped reads and should be tested for as not all users will filter these out of the input bam files.

https://github.com/cerebis/bin3C/blob/4e32125dba07a42da10ffb51f29ac99f3e984794/mzd/contact_map.py#L616

cerebis commented 4 years ago

Investigating this issue showed that this issue is broader than first assumed.

The parsing logic for paired reads is flawed, in that it is assumes that a maximum of one record exists per read, while the existence of supplementary or secondary records invalidates this assumption.

If the referenced code above is changed to avoid dereferencing None, it leads to the tautological situation that secondary and supplementary alignments can be taken as R1 or R2 and then rejected immediately after for their status (for being sec/suppl).