KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

Improvement of binning cutoff/results #46

Closed jason-c-kwan closed 3 years ago

jason-c-kwan commented 4 years ago

A few months ago I was experimenting in the Bitbucket repository, specifically the horizontal_transfer branch. Specifically, I was trying to figure out how to use the SPAdes assembly graph to identify contigs that had been horizontally transferred, and thus were being mis-binned due to their nucleotide composition. One of the problems was ensuring that the initial bins were very accurate (or at least less likely to be contaminated), because otherwise the task is very difficult. Anyway, through experimentation I found the following cutoffs worked very well:

  1. Completeness cutoff of 20% - this is the same as what we have now.

  2. Purity cutoff of 95% - this is 5% more than we have now.

  3. GC% standard deviation limit of 5% - this is new.

  4. Coverage standard deviation limit of 25% - this is new.

The above are quite conservative, but I suggest that we implement metrics 3 and 4 so that they can be later tuned. I would also be interested in implementing my HGT detection logic down the line.

evanroyrees commented 4 years ago

I've also implemented a summary of MAG statistics in the autometa-app here, for future reference.

jason-c-kwan commented 4 years ago

Yes, but what I mean is that metrics 3 and 4 should be in the Autometa code so they can be used as cutoffs during clustering.

evanroyrees commented 4 years ago

Yes, but what I mean is that metrics 3 and 4 should be in the Autometa code so they can be used as cutoffs during clustering.

Right, I've just placed a link to an implementation of the calculations for reference for anybody returning to implement this during clustering.

evanroyrees commented 4 years ago

Some metagenome and metabin stat calculations are implemented in their respective files:

metagenome.py

https://github.com/KwanLab/Autometa/blob/ddc2bf951d6c4a8c29e9e2301e15a762327b0267/autometa/common/metagenome.py#L324-L361

metabin.py

https://github.com/KwanLab/Autometa/blob/ddc2bf951d6c4a8c29e9e2301e15a762327b0267/autometa/common/metabin.py#L319-L345