Closed jason-c-kwan closed 3 years ago
I've also implemented a summary of MAG statistics in the autometa-app here, for future reference.
Yes, but what I mean is that metrics 3 and 4 should be in the Autometa code so they can be used as cutoffs during clustering.
Yes, but what I mean is that metrics 3 and 4 should be in the Autometa code so they can be used as cutoffs during clustering.
Right, I've just placed a link to an implementation of the calculations for reference for anybody returning to implement this during clustering.
Some metagenome and metabin stat calculations are implemented in their respective files:
A few months ago I was experimenting in the Bitbucket repository, specifically the
horizontal_transfer
branch. Specifically, I was trying to figure out how to use the SPAdes assembly graph to identify contigs that had been horizontally transferred, and thus were being mis-binned due to their nucleotide composition. One of the problems was ensuring that the initial bins were very accurate (or at least less likely to be contaminated), because otherwise the task is very difficult. Anyway, through experimentation I found the following cutoffs worked very well:Completeness cutoff of 20% - this is the same as what we have now.
Purity cutoff of 95% - this is 5% more than we have now.
GC% standard deviation limit of 5% - this is new.
Coverage standard deviation limit of 25% - this is new.
The above are quite conservative, but I suggest that we implement metrics 3 and 4 so that they can be later tuned. I would also be interested in implementing my HGT detection logic down the line.