MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Error starting analysis from fastq.gz files #19

Closed timregan closed 4 years ago

timregan commented 4 years ago

Hi there, I have .fastq.gz files which have already been trimmed and filtered. I've followed the instructions to import the Silva_132_release database for 16S gene with sequences clustered at 99% identity, with no issues. The MetONTIIME_env activates fine and qiime seems to function ok. (I have used anaconda instead of miniconda - is this a problem?).

In the config_MinION_mobile_lab.R file, I have entered the following `########################################################################################################

PIPELINE DIR

PIPELINE_DIR <- "/exports/eddie/scratch/tregan/MinION/MetONTIIME"

MINICONDA DIR

MINICONDA_DIR <- "/exports/cmvm/eddie/eb/groups/bean_grp/anaconda"

basecaller_dir

BASECALLER_DIR <- "/exports/eddie/scratch/tregan/MinION/ont-guppy-cpu/bin"

NCBI-downloaded sequences (QIIME2 artifact)

DB <- "/exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_sequence.qza"

Taxonomy of NCBI-downloaded sequences (QIIME2 artifact)

TAXONOMY <- "/exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_taxonomy.qza"

sample-metadata file describing samples metadata; it is created automatically if it doesn't exist

SAMPLE_METADATA <- "/exports/eddie/scratch/tregan/MinION/MetONTIIME/metadata.tsv" ########## End of user editable region ################################################################# I am trying to run the following command in the MetONTIIME folder nohup ./MetONTIIME.sh /exports/eddie/scratch/tregan/MinION/Metabarcoding_lib/Barcoded_Lib1/Filtered/MetONTIIME_wd /exports/eddie/scratch/tregan/MinION/MetONTIIME/metadata.tsv /exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_sequence.qza /exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_taxonomy.qza 2 Blast 3 0.7 0.7 &` Where

: /exports/eddie/scratch/tregan/MinION/Metabarcoding_lib/Barcoded_Lib1/Filtered/MetONTIIME_wd : /exports/eddie/scratch/tregan/MinION/MetONTIIME/metadata.tsv : /exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_sequence.qza : /exports/eddie/scratch/tregan/MinION/MetONTIIME/silva_132_99_16S_taxonomy.qza : 2 : Blast : 3 : 0.7 : 0.7 following the example input `nohup ./MetONTIIME.sh &` Attached is the entire log I get when trying to run this command. Any ideas? Many thanks, Tim [nohup.out.txt](https://github.com/MaestSi/MetONTIIME/files/5149944/nohup.out.txt)
MaestSi commented 4 years ago

Hi, using anaconda instead of miniconda should not be a problem. The issue is due to failed import of fastq.gz files. I noticed the pipeline is trying to import BC32_15_L001_R1_001.fastq.gz file, which (from the name) looks like an Illumina file. It may be that you have a subfolder of your working direcory that contains Illumina data. In that case, you should move it away from the analysis dir, otherwise the script is trying to import those files as well, giving errors. Could you please provide me with the content of the /exports/eddie/scratch/tregan/MinION/Metabarcoding_lib/Barcoded_Lib1/Filtered/MetONTIIME_wd/manifest.txt file, and confirm that it also contains lines that refer to "unwanted" Illumina files? Thanks, Simone

timregan commented 4 years ago

Hi Simone,

I was hoping you might be able to shed some light on the BC32_15_L001_R1_001.fastq.gz file - it is definitely not one of mine and I have no idea where it might be from. I thought it was a default file built into the pipeline. The manifest.txt file is attached. I have manually checked it and confirmed that it only contains names and paths of files which I know about and can account for (all trimmed MinION reads). There is nothing else in this directory other than these 76 .fastq.gz files (besides the manifest file and the "collpased_feature_tables" dir which contains 7 empty .tsv files). manifest.txt

MaestSi commented 4 years ago

That's strange. I guess there may be some kind of formatting issues in the fastq.gz files. Did you obtain them with guppy_barcoder? If you want, you can send me a subset of 10 reads with: zless <file.fastq.gz> | head -n40 > test.fastq to simone.maestri@hotmail.it, and I can try to import it. Simone

MaestSi commented 4 years ago

Another possibility would be to run the full pipeline on a small subset of data (e.g. set options in the config file to subsample only 1 fast5 file), see if it runs to the end, and then compare the format of your files to your original ones. P.s.: use VSEARCH instead of Blast for much faster analyses. Simone

timregan commented 4 years ago

Simone - thanks so much for the help.

One problem was that I renamed the files to match the format BC<num>.fastq.gz, however, this was not matched in the .fastq headers which retained the original sample names. When I corrected for this, it appeared that I still had some files with errors (truncated files, either missing partial lines of quality score or had something else weird) at the end of the file. I tried subsets of files with the same result, so it was not limited to a couple of dodgy files. The reason I never attempted the pipeline from the start initially was due to the fact that I have used dual barcodes with the format <Barcode 1><Barcode 2> . I used HAC basecalling on Guppy using GPU and the programe Minibar (https://github.com/calacademy-research/minibar) to demultiplex the reads and trim the adapters.

Anyway, it was during the trimming that something went wrong for each of my files, but this has since been corrected (Edit: I think it may actually have been from the filtering tool I was using (quality score starting with '@' getting mixed up with name?), not minibar...). Many thanks for the helpful tips and pointing me in the right direction 👍

MaestSi commented 4 years ago

Hi Tim, glad you found a solution! Ciao, Simone

Ksherriff commented 3 years ago

I was wondering if you could go into more detail on how this was solved as I think I am running into the same issue. I am currently testing using just fastq files on a single fastq.gz file for simplicity. I have attached the manifest and nohup output. Here is my code for the run: nohup ./MetONTIIME.sh PR_MCB_Files/PR_MCB_11/out_dir/ /home/usr/MetONTIIME/sample-metadata.tsv /home/usr/NB2_16sDatabase_sequence.qza /home/usr/NB2_16sDatabase_taxonomy.qza 10 Vsearch 3 0.8 0.85 &

The files were live basecalled via Minknow and then adapters and barcodes were trimmed with porechop

Here are the txt files: nohup_out.txt manifest.txt

Thanks

MaestSi commented 3 years ago

Hi, in the nohup.out file I found:

'PR_MCB_Files/PR_MCB_11/out_dir//manifest.txt' does not exist.

I think the issue may be due to the fact that you are not specifying the full path (either absolute or relative) to the working directory. Let me know if this solves the issue. Please delete all files created by the pipeline and the nohup.out file before rerunning. Simone

Ksherriff commented 3 years ago

That did the trick. Thanks!