apcamargo / pycoverm

Simple Python interface to CoverM's fast coverage estimation functions
GNU General Public License v3.0
7 stars 2 forks source link

Implement CoverM modularity into pyCoverM #2

Open apcamargo opened 3 years ago

apcamargo commented 3 years ago

One of the most useful things about CoverM is that it is modular, so you can determine the metrics/properties you want to compute via the --methods argument and CoverM will get them all in a single run. For example, --methods tpm covered_bases length will compute the contig (or genome) TPM, number of bases covered by reads, and reference length in a single parsing of the input BAMs.

The way pyCoverM is put together right now doesn't allow that kind of flexibility. It would require each metric to have its own function, requiring each BAM file to be parsed multiple times.

Ideally, pyCoverM would have a flexible function that takes as input all the metrics the user wants to get and it would compute them all in a single execution. Another option would be to create a class that stores all the basic information (number of reads per contig, covered bases, reference length, variance,etc.) and get the other metrics from that.