cerebis / bin3C

Extract metagenome-assembled genomes (MAGs) from metagenomic data using Hi-C.
GNU Affero General Public License v3.0
23 stars 7 forks source link

Extract could operate on either type of BAM ordering #27

Open cerebis opened 4 years ago

cerebis commented 4 years ago

bin3C imposes on the user that input BAMs are query name sorted. This makes pair matching trivial and low memory. However, when it comes to invoking bin3C extract -f bam ..., a coordinate sorted and indexed BAM would be much faster to process.

Fix

We should inspect the BAM for ordering and adapt the parsing logic from iterating over the entire input BAM (ie fetch(until_eof=True)) to iterating over the involved references and fetching alignments.

ie

for ref_name in cluster: 
    for aln in fetch(ref_name):
        # do something