MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Empty collapsed tables-metadata file problems #31

Closed lgralo closed 3 years ago

lgralo commented 3 years ago

Hi Simone, Thank you very much for your great work and share with us your scripts! I have been trying to use MetONTIIME to analyse fastq that have already been demultiplexed and basecalled with Minknow. First, I have some doubts about the files I have because they are not fastq.gz files, they are fastq files. For example I have the directory fastq_pass with 12 directories barcode(numbers 1-12) and in each directory barcode(number) I have several fastq files containing each fastq 4000 sequences. Can I use the script MetONTIIME.sh with these data?

Second, when I try to run the script MetONTIME.sh I get only some empty collapsed tables. I use the code: nohup ./MetONTIIME.sh /home/microbiota/Escritorio/MetONTIIME-master/fastq_pass /home/microbiota/Escritorio/MetONTIIME-master/fastq_pass/manifest.tsv sequence_sequence.qza sequence_taxonomy.qza 30 Blast 3 0.8 0.85

I have attached the nohub.out file - could you please help me to solve the error I'm getting? It seems there is an error in the metadata file. I have checked it several times but it seems that the file is not found or is not in the correct format. nohup.txt manifest.txt

Thanks for any guidance you can offer!

MaestSi commented 3 years ago

Hi, you could cd to the folder just outside of fastq_pass folder and try running this:

mkdir analysis

for b in $(find "fastq_pass" -maxdepth 1 | grep "barcode"); do
  bn=$(echo $(basename $b) | sed 's/barcode//');
  f=$(find $b | grep "\\.fastq");
  cat $f | gzip > "analysis/BC"$bn".fastq.gz";
done

This piece of code should create analysis folder (on the same level of fastq_pass folder), which should contain BC\<num>.fastq.gz files, with one file for each sample. Before rerunning the code be sure you delete the previously created manifest.tsv and manifest.txt (and nohup.out) files.

Then, the command for running the pipeline should be:

nohup ./MetONTIIME.sh /home/microbiota/Escritorio/MetONTIIME-master/analysis /home/microbiota/Escritorio/MetONTIIME-master/analysis/manifest.tsv sequence_sequence.qza sequence_taxonomy.qza 30 Blast 3 0.8 0.85 &

Let me know if you succeed running the pipeline. Also, consider that Blast is single-threaded in QIIME2 implementation, therefore if you have a big amount of data you should consider switching to Vsearch. Simone

Edit: I made a mistake in the script, now it should be fixed.

lgralo commented 3 years ago

Hi Simone, I succeed getting the fastq.gz files, but when I run the pipeline with the MetONTIIME script I only obtain the files: demux_summary.qzv, manifest.txt, rep-seqs.qza, rep-seqs.qzv, sequences.qza, table.qza, and a directory collapsed_feature_tables, that contains the tables but they are empty. I send you the nohup.txt and the manifest.txt generated during the run. manifest.txt nohup.txt It seems again there is a problem with the manifest.txt. Finally, what do you mean "Edit: I made a mistake in the script, now it should be fixed". Should I change something else? Thank you very much for your help!

lgralo commented 3 years ago

Hi Simone, I have just noticed that with the code you send me to join the fastq files (cat) present in each barcode directory, is not working. Only one file is compressed and the number of sequences per sample is 4000, corresponding to one fastq file. Could you help me please? Thank you very much!

MaestSi commented 3 years ago

Hi, I didn't know where your analysis folder was located in your machine, now that I see the manifest I see the command for running the analysis should be:

nohup ./MetONTIIME.sh /home/microbiota/Escritorio/MetONTIIME-master/Data/analysis /home/microbiota/Escritorio/MetONTIIME-master/Data/analysis/manifest.tsv sequence_sequence.qza sequence_taxonomy.qza 30 Blast 3 0.8 0.85 &

I have just noticed that with the code you send me to join the fastq files (cat) present in each barcode directory, is not working. Only one file is compressed and the number of sequences per sample is 4000, corresponding to one fastq file.

That is what I meant when I wrote that I made a mistake in the script. I corrected it a few minutes after writing it, but probably you had already run it. If you now delete all generated files, rerun the code to merge the fastqs and move them to /home/microbiota/Escritorio/MetONTIIME-master/Data/analysis folder and then run the MetONTIIME pipeline with the above command line, it should work. Simone

lgralo commented 3 years ago

Hi Simone, Now the script to get the .gz files works well and the pipeline too. Thank you very much for your help! Laura

MaestSi commented 3 years ago

Great! Ciao! Simone