CAMI-challenge / AMBER

AMBER: Assessment of Metagenome BinnERs
https://cami-challenge.github.io/AMBER/
GNU General Public License v3.0
25 stars 7 forks source link

Filter contigs by length #34

Closed abremges closed 6 years ago

abremges commented 6 years ago

First of all, great work with AMBER!

I ran AMBER on the mouse gut toy dataset, which contains many very small contigs in the GSA. Some bins exclusively contain small contigs and are not recoverable by common genome binners.

It is already possible to manually exclude a set of genomes; I propose a complementary feature: Filter contigs by size (threshold set by user, default e.g. 2.5kb) and exclude these from analyses. This will also remove some gold standard bins completely (if they don't contain longer contigs).

abremges commented 6 years ago

I stumbled upon this looking at boxplot_completeness, where (I believe) each bins has equal weight and thus—in my case—the depicted median etc. was zero.

fernandomeyer commented 6 years ago

Implemented in bce66ecac88490be3f5d6102af226db5383bdc99. The new option is --min_length, -n (default 0 to retain default behavior).

abremges commented 6 years ago

Great, thank you! Will test this new feature soon-ish. 👍