MicrobialDarkMatter / nanomotif

Nanomotif - a tool for identifying methylated motifs in metagenomic samples
MIT License
22 stars 1 forks source link

--save_scores output legend #64

Closed brambloemen closed 1 month ago

brambloemen commented 2 months ago

Thank you for developing this tool, I had been looking forward to using it!

For a metagenomic sample I'm analyzing, I have a strong suspicion that a certain plasmid should be binned with a certain MAG, based on an isolate obtained from the same sample. However, they are currently not binned together by include_contigs, so I'm trying to figure out why.

I'm looking at the --save_scores output, but I'm a bit lost what the variables mean. Is there more documentation about the output available?

To make the example more concrete, I attached the --save_scores output for the plasmid, which I think should be added to bin 0

binary_compareunbinned_contig_847.csv

SebastianDall commented 1 month ago

Hi,

Thank you for showing interest in nanomotif. This feature is not well documented, which I will fix. In the meantime:

If you filter the bin column for bin 0 you get all the motifs that are compared. The relevant columns to look at are:

mean_methylation_bin, motif_mod, methylation_binary, mean, methylation_mean_threshold, and methylation binary compare.

For bin 0 the mean methylation for motif GCCGGC_m-2 is 0.57 The methylation_binary is 1 (we deem it methylated) The mean methylation of the contig is 0.4 but the methylation_mean_threshold is 0.42 (that is the mean methylation of the bin - 4*std), meaning nanomotif think the methylation of the contig is outside the expected range of the bin and therefore the methylation compare is 0. In this case the bin consists of one contig so an artificial standard deviation is used 0.0375, which worked well for other samples but a better implementation will be developed with more data.

For now, it seems to me the contig should have been associated (falling just outside the threshold of the bin)

Kind regards, Sebastian

brambloemen commented 1 month ago

Thank you for taking some time to look at this, and for your very clear reply! It clarifies a lot.

To give a bit more context: The assembly yielded a single circular E. coli MAG ~5 Mbp, which is why there was only a single contig in the bin. The plasmid in question is ~80 kbp, so it's probably got a lot less occurrences of this pattern. The mean coverage of both is about 30x, so maybe the plasmid size + low-ish coverage could have resulted in the observed deviation of the plasmid mean to the bin mean.