Question for running a large dataset

MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework

GNU General Public License v3.0

78 stars 17 forks source link

Question for running a large dataset #68

Closed ShaelynXU closed 1 year ago

ShaelynXU commented 1 year ago

Hello Simone,

Thank you for your help last time.

I am running 24 barcoded samples at one time, which includes 229,182 reads in my rep-seqs.qza file. I encountered issues in the "assignTaxonomy" step and assumed it is because of the RAM memory. After modifying the conf file (change it to 60G), the problem still occurs.

Would you think 60G would be sufficient to process this input file? Or do you have other solutions to solve this error?

Thank you! Shaelyn

MaestSi commented 1 year ago

Hi, I think 60 GB may not be enough for such a (quite) big dataset. You may either process the samples in chunks (say from barcode 1 to barcode 6, then from barcode 7 to barcode 12 etc) by moving the fastq.gz files to different folders. Then, you can merge the feature tables afterwards, using some QIIME2 commands. As an alternative, you may downsample the number of reads for each sample. I would recommend this solution in case the samples are very unbalanced in terms of numbers of reads (see dataQC/demux_summary.qzv file for evaluating the number of reads for each sample). SM

ShaelynXU commented 1 year ago

Hello Simone,

Thank you for your suggestions. Yes my samples are not balanced and I would like to try with downsampling. Could you please tell which part should I modify in the conf or nf file?

Thank you, Shaelyn

MaestSi commented 1 year ago

Hi, you should set a lower value to maxNumReads parameter (line 17 in the conf file) and leave downsampleFastq=true (line 50 in the conf file). You may start trying with up to 5k reads per sample. SM

ShaelynXU commented 1 year ago

Thank you Simone, I will try with your recommendation accordingly and keep you updated!

MaestSi commented 1 year ago

Hi, I am going to close the issue. If you have any further questions, or update on the results, feel free to reopen it or comment here! SM