How to choose the best chunks region for personal metagenome data

Hi,

I have run the chunks_info for my personsal reference metagenome data and got chunks of 100813.

As for the command to "computing current differences", the parameter to select the region of chunks, how to choose the best region?

I have tried the following command to get the RDS file: nanodisco difference -nj 50 -nc 50 -p 50 -f 1 -l 1000 -i analysis/preprocessed_sea9_subset -o analysis/difference_sea9_subset_1-1000 -w SEA_9_WGA -n SEA_9_NAT -r reference/seameta_9_wga_trimmed.fasta

After I merge the differences files, the output of the RDS file is 1.6 mb. It is not available to go for the discovery of methylation motifs.

I would like to know how to choose the best region for computing current differences.

Best,

Bruce

Hello Bruce,

It's great that you got to generate matching WGA data for the analysis. I'm not sure what your experimental goal is. Do you want to bin metagenomic contigs or simply find some methylation motifs?

For the former, I would recommend computing current differences for all chunks and follow the tutorial for detailed commands. FYI, we used a subset of difference in this tutorial's first example simply to reduce run time for testing. If you don't want to process every chunk right away, focusing on the ones from higher coverage and longer contigs could be a solution.

For the latter, you could also focus your attention on longer and higher coverage contigs first. In this situation, please keep in mind that contigs could come from different organisms so you need to use the -c option for the discovery of methylation motifs (nanodisco motif). This also applies to the data already generated. If you have binning information from other tools, you can also leverage it and only analyze some set of binned contigs.

On another note, we recommend that, if not already done, you perform a step of WGA-like polishing for the de novo assembly in order to generate a metagenome compatible with motif detection (nanopore native read only assembly could leave systematic errors). We briefly described our approach here.

Regards,

Alan

fanglab / nanodisco

How to choose the best chunks region for personal metagenome data #14