MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Analysis Keep on Crashing. Metadata file path not found. #61

Closed Jawen-D closed 1 year ago

Jawen-D commented 1 year ago

Hello MaetSi,

I keep having this error when I run 29 samples from fastq files. I did some pre run using two samples with the same parameters and it was able to complete the process and produce the relfreq and absfreq_table.qzv However, when I run more than 10 samples I had to wait for atleast a week and still the analysis will crash and show me this error:

===========

There was an issue with loading the file taxonomy.qza as metadata:

Metadata file path doesn't exist, or the path points to something other than a file. Please check that the path exists, has read permissions, and points to a regular file (not a directory): taxonomy.qza

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

===========

I have uploaded the nohup.file for your reference

nohup.out.txt

Hoping for your swift response, Jawen

MaestSi commented 1 year ago

Hi, first of all, the metadata file is neither cancelled nor edited by the pipeline if it already exists from a previous run. Did you remember to edit it or remove it prior to running the analysis with the full set of samples? If yes, I fear the error may be due to a RAM memory issue. You could try downsampling each of the 29 samples to 1000 reads and run the analysis. If it succeeds, while the full set does not, it could be a RAM memory issue. Moreover, please check you do not have any zero-length reads, as they may cause some issues. You can do that by filtering reads with NanoFilt with minimum length 1, and check if the files before and after the filtering match. Best, SM

Jawen-D commented 1 year ago

Hello Simone,

Thanks for your response.

Did you remember to edit it or remove it prior to running the analysis with the full set of samples?

Yes I did remove the metadatafile from the previous run before rerunning my samples. I only have the fastq.gz files and SILVA seq and taxonomy files in the working directory before I do each run.

If yes, I fear the error may be due to a RAM memory issue.

I had this problem and posted this issue before. Then you recommended lowering the number of threads. I did. but it only works on less number of samples. With the 29 samples, the analysis still halts even if I used fewer threads (4 or 8).

You could try downsampling each of the 29 samples to 1000 reads and run the analysis.

Can I do this in NanoFilt? is there a bash script that I can use to filter my samples based on reads or do I do it by manually selecting the reads from the files? I need some guidance :)

please check you do not have any zero-length reads, as they may cause some issues. You can do that by filtering reads with NanoFilt with minimum length 1, and check if the files before and after the filtering match.

I will do this.

Thank you

Best, Jawen

MaestSi commented 1 year ago

Hi, you could run this script to downsample the reads:

FASTQ_DEMULTIPLEXED_DIR="/path/to/dir" #this is the folder where your fastq.gz files are
FASTQ_DEMULTIPLEXED_SUBSAMPLED_DIR="/path/to/dir" #this is an empty folder where your downsampled and length-filtered files will be stored
SAMPLING_DEPTH=100000 #number of reads for each sample

source activate MetONTIIME_env
mkdir -p $FASTQ_DEMULTIPLEXED_SUBSAMPLED_DIR
for f in $(find $FASTQ_DEMULTIPLEXED_DIR -name "*\.fastq\.gz"); do
  sn=$(echo $(basename $f));
  seqtk sample $f $SAMPLING_DEPTH | NanoFilt -l 1 | gzip > $FASTQ_DEMULTIPLEXED_SUBSAMPLED_DIR/$sn;
done

SM

Jawen-D commented 1 year ago

Hello Simone,

I did run the script you gave. But I dont see any subsampled files in the directory I created. Im not sure which part I did wrong.

Regards, Jawen

MaestSi commented 1 year ago

Hi, probably I did a typo, please retry with the updated script. SM

Jawen-D commented 1 year ago

Hello SM,

The subsampling worked! thank you. I am going to run the MetONTIIME analysis now. I'll give an update once done. I appreciate the assistance.

Regards, Jawen

Jawen-D commented 1 year ago

Hello SM,

I run the analysis again using the subsampled files. Unfortunately, I still encountered the same error.

Screen Shot 2023-04-08 at 10 39 16 AM

May I ask for further steps?

Thanks

Regards, Jawen

MaestSi commented 1 year ago

Hi Jawen, first of all, are you specifying full paths to files (for example to sample-metadata) when running the pipeline? Please post here the command you ran. Second, you may do a quick test subsampling 100 reads per sample, saving them to a new directory, and running the pipeline on those. If that works, while with the bigger dataset it doesn't, then it's a matter of RAM. Simone

Jawen-D commented 1 year ago

Hello SM,

Here's the script I used to re run the MetONTIIME analysis using the subsampled files.

nohup /home/ihopelab/installers/MetONTIIME/MetONTIIME.sh -w /home/ihopelab/Microbiome_2023/02_trial/06_trial/ -f /home/ihopelab/Microbiome_2023/02_trial/06_trial/sample-metadata.tsv -s /home/ihopelab/Microbiome_2023/02_trial/06_trial/silva_132_97_16S_sequence.qza -t /home/ihopelab/Microbiome_2023/02_trial/06_trial/silva_132_97_16S_taxonomy.qza -n 8 -c Vsearch -m 1 -q 0.8 -i 0.85 &

I saved the subsampled samples to a new folder before running them. I am currently running the batch with 100 subsampled reads.

Can you please check if the script I used is okay?

Thanks

Regards, Jawen

MaestSi commented 1 year ago

Hi, it seems ok to me! Best, SM

Jawen-D commented 1 year ago

Hello SM,

It is working now. Thank you very much for your assistance.

Regards, Jawen

MaestSi commented 1 year ago

Perfect! Best, SM