Could not create all the output files

mjherre1 commented 3 years ago

Hi Simone,

Thank you very much for providing a great pipeline and sharing your scripts! I’ve been trying the MetONTIIME pipeline a few times now and am reaching a few errors that I have been trying to fix by reading through the other issues posted. I have sequences that have already been basecalled using guppy, and followed your guidelines, so that there is one file per sample (total of 12 samples). I get the pipeline to run and it ran for about two weeks using vsearch, but then it came up with an error in the nohup.out file. Through the course of the job, it did output the following files: sequences.qza, table_tmp.qza, and rep-seqs_tmp.qza, demux_summary.qzv and a directory with empty collapsed feature tables. I notice in the error file it might have something to do with the temporary directory. However, when I go to the temporary directory that I assigned outside of my working directory, I could not find the log file, so I am unable to share it with you. However, I will share the nohup.out file, and the manifest and metadata files that the program generated, in case there are other issues with my command or files.

The command that I ran:

nohup ./MetONTIIME.sh /home/centos/USS/mjh_minION/Met_step /home/centos/USS/mjh_minION/Met_step/sample-metadata.tsv /home/centos/USS/mjh_minION/Met_step/silva_132_99_16S_sequence.qza /home/centos/USS/mjh_minION/Met_step/silva_132_99_16S_taxonomy.qza 5 Vsearch 3 0.8 0.85 &

Where “Met_step” contains the sequences as well as the script MetONTIIME.sh and the sequence and taxonomy files.

I am running on a cloud compute system that should have enough space to run this analysis (it uses about 35GB of the 116GB available). And I assigned a temporary directory outside of this working directory, using “export TMPDIR=‘/home/centos/USS/cw-temp’” and echo $TMPDIR to check.

Thank you for any guidance that you can offer, I really appreciate your time!!

manifest.txt nohup.out.txt

MaestSi commented 3 years ago

Hi! Difficult to say what is causing the error based on the log files...I suppose it might have something to do with the available RAM memory, but I am not sure. To check if the pipeline is properly configured/installed, you could run the analysis on a small subset.

cd /home/centos/USS/mjh_minION/
mkdir subset

for f in $(find /home/centos/USS/mjh_minION/Met_step | grep "\\.fastq\\.gz"); do
  sn=$(echo $(basename $f | sed 's/\.fastq\.gz//'));
  seqtk sample $f 1000 | gzip > "subset/"$sn".fastq.gz";
done

nohup /home/centos/USS/mjh_minION/Met_step/MetONTIIME.sh /home/centos/USS/mjh_minION/subset /home/centos/USS/mjh_minION/subset/sample-metadata.tsv /home/centos/USS/mjh_minION/Met_step/silva_132_99_16S_sequence.qza /home/centos/USS/mjh_minION/Met_step/silva_132_99_16S_taxonomy.qza 5 Vsearch 3 0.8 0.85 &

Simone

mjherre1 commented 3 years ago

Hello Simone,

Thank you for your quick response and for your suggestion! I ran the analysis on a small subset with the commands as you suggested, and the pipeline ran to completion with files with data in the collapsed feature table directory! It also had a lot more files than what was previously outputted in my original run, with a table.qzv, taxonomy.qzv, and all the feature-table tsv files. I have attached the nohup.out file. Is it properly configured? In terms of RAM memory, we have 150GB of RAM available, and I believe the original run took up about 35GB of that, but not completely sure.

Thank you very much for your help!! nohup.out.txt

MaestSi commented 3 years ago

Great, everything worked! So I suggest doing the analysis with 100k reads per sample (or less, depending on the minimum number of reads for each sample), by changing 1000 to 100000 in the above code and subsample to subsample_100k. You can see the number of reads for each sample by uploading demux_summary.qzv file obtained with the full dataset to qiime2 viewer. Simone

MaestSi commented 3 years ago

I am going to close the issue, as it probably was a memory issue! Please, let me know if you succeed running the analysis with 100k reads per sample! Best, Simone

mjherre1 commented 3 years ago

Hi Simone,

Thank you for the suggestion! I am glad that it is installed properly. I will try running on a subset of 100k reads. The read count for each sample is pretty high, the minimum is 400k, so I concur that unfortunately it might be a memory issue. I will let you know if the 100K succeeds. Thank you for your time and help! I appreciate it.

Cheers, Michelle

mjherre1 commented 3 years ago

Hi Simone,

I believe the run with a subset of 100k reads completed successfully with the output files! I have attached the nohup.out file here. Thank you for the suggestion of subsampling, as it appears it was a memory issue. Thank you again for your help and time!

Cheers,

Michelle nohup.out.txt

MaestSi commented 3 years ago

Perfect! Ciao! Simone

MaestSi / MetONTIIME

Could not create all the output files #32