HRGV / phyloFlash

phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
GNU General Public License v3.0
75 stars 25 forks source link

Index creation problem with PhyloFlash – Clustering Step _ FATAL: Tool execution failed! Error was 'No such file or directory' and return code '9'. #146

Closed MeganeNoyer closed 3 years ago

MeganeNoyer commented 3 years ago

Hi, I am trying to create my index with phyloflash but error messages appear when clustering. I could see that other people had this problem but I could not fix it by looking at the advice given for this problem.

At first, I tried with the basic command line:

phyloFlash_makedb.pl --remote [09:43:58] Checking for required tools. [09:43:58] Using bbduk found at [09:43:58] All required tools found. This is phyloFlash_makedb.pl from phyloFlash.pl v3.4 [09:43:58] downloading latest univec from ncbi [09:43:58] Connecting to ftp.ncbi.nlm.nih.gov [09:43:59] Finding /pub/UniVec/UniVec [09:44:00] Found UniVec (1701925 bytes) [09:44:03] downloading latest SSU RefNR from www.arb-silva.de [09:44:03] Connecting to ftp.arb-silva.de [09:44:03] Finding /current/Exports/*_SSURef_N?99_tax_silva_trunc.fasta.gz [09:44:04] Found SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz (195410064 bytes) [09:44:05] The file you are about to download comes with a license: [09:44:05] Do you wish to continue downloading under the conditions

[09:44:23] Verifying MD5... [09:44:24] File ok [09:44:24] unpacking SILVA database [09:44:38] searching for LSU contamination in SSU RefNR [09:44:38] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom bac --threads 64 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.bac.gff 2>tmp.barrnap_hits.bac.barrnap.out [09:51:27] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom arch --threads 64 --evalue 1e-10 --gene lsu --reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.arch.gff 2>tmp.barrnap_hits.arch.barrnap.out [09:58:46] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom euk --threads 64 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.euk.gff 2>tmp.barrnap_hits.euk.barrnap.out [10:08:19] Removing sequences with potential LSU contamination [10:08:19] Number of sequences to skip: 120 [10:08:25] masking low entropy regions in SSU RefNR [10:08:25] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh overwrite=t -Xmx10g threads=64 in=./138.1//SILVA_SSU.noLSU.fasta out=./138.1//SILVA_SSU.noLSU.masked.fasta minkr=4 maxkr=8 mr=t minlen=20 minke=4 maxke=8 fastawrap=0 2>tmp.bbmask_mask_repeats.log [10:12:27] removing UniVec contamination in SSU RefNR [10:12:27] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbduk.sh ref=UniVec overwrite=t -Xmx10g threads=64 fastawrap=0 ktrim=row=t minlength=800 mink=11 hdist=1 in=./138.1//SILVA_SSU.noLSU.masked.fasta out=./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta stats=./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta.UniVec_contamination_stats.txt 2>tmp.bbduk_remove_univec.log [10:14:25] Vsearch v2.5.0+ found, will index database to UDB file [10:14:25] Indexing ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta to make UDB file ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb with Vsearch [10:14:25] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --threads 64 --notrunclabels --makeudb_usearch ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta –output ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb 2>tmp.vsearch_make_udb.log [10:20:03] clustering database [10:20:03] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta –id 0.99 –centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 64 vsearch v2.17.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 13%[10:52:51] FATAL: Tool execution failed!. Error was 'No such file or directory' and return code '9' Aborting. [10:52:51] Saving log to file phyloFlash_log_on_error

_Then I tried removing the file at 0 : SILVA_SSU.noLSU.masked.trimmed.NR99.fasta, and allowing SILVA_SSU.noLSU.masked.trimmed.udb to be read._

phyloFlash_makedb.pl --silva_file SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz --univec_file UniVec --CPUs 16 -nooverwrite [13:58:43] Checking for required tools. [13:58:44] Using bowtiebuild found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bowtie-build". [13:58:44] Using bbduk found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbduk.sh". [13:58:44] Using bbmask found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh". [13:58:44] Using grep found at "/usr/bin/grep". [13:58:44] Using vsearch found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch". [13:58:44] Using barrnapHGV found at "/zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV". [13:58:44] Using bbmap found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmap.sh". [13:58:44] All required tools found. This is phyloFlash_makedb.pl from phyloFlash.pl v3.4 [13:58:44] using local copy of univec: UniVec [13:58:44] using local copy of Silva SSU RefNR: SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz [13:58:44] unpacking SILVA database [13:58:59] searching for LSU contamination in SSU RefNR [13:58:59] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom bac --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.bac.gff 2>tmp.barrnap_hits.bac.barrnap.out [14:01:25] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom arch --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.arch.gff 2>tmp.barrnap_hits.arch.barrnap.out [14:04:02] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom euk --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.euk.gff 2>tmp.barrnap_hits.euk.barrnap.out [14:07:21] Removing sequences with potential LSU contamination [14:07:21] Number of sequences to skip: 120 [14:07:27] masking low entropy regions in SSU RefNR [14:07:27] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh overwrite=t -Xmx10g threads=16 in=./138.1//SILVA_SSU.noLSU.fasta out=./138.1//SILVA_SSU.noLSU.masked.fasta minkr=4 maxkr=8 mr=t minlen=20 minke=4 maxke=8 fastawrap=0 2>tmp.bbmask_mask_repeats.log [14:09:07] removing UniVec contamination in SSU RefNR [14:09:07] File ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta exists, not overwriting [14:09:07] Vsearch v2.5.0+ found, will index database to UDB file [14:09:07] Indexing ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta to make UDB file ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb with Vsearch [14:09:07] WARNING: File ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb already exists. Not overwriting [14:09:07] clustering database [14:09:07] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta –id 0.99 –centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 16 vsearch v2.17.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 37%[14:40:28] FATAL: Tool execution failed!. Error was 'No such file or directory' and return code '9' Aborting. [14:40:28] Saving log to file phyloFlash_log_on_error

Finally, I tried removing the file at 0 again and running directly with Vsearch:

vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta --id 0.99 --centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 10 vsearch v2.7.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 45%Killed

Can you help me solve this problem so that I can create my index? Thanks in advance

Megane

kbseah commented 3 years ago

hello Megane, It looks like the vsearch job is being killed during the clustering step. This might be because the job is consuming too much memory, or exceeding a time limit. Could you try again with fewer processors, i.e. --CPUs 8 or --CPUs 4 ?

MeganeNoyer commented 3 years ago

Thanks for your quick reply ! I have tested with 8 and 4 but I still get the same error message either with the phyloflash command or with that of vsearch... I don't understand why...

kbseah commented 3 years ago

The machine you are using should have more than enough RAM to finish the database build step. I suspect that your system admin may have placed some time or resource usage limitations. Are you running this on a shared computing cluster that uses some kind of job management system like Slurm or SGE? If so, are you submitting this as a queued batch job (e.g. with qsub on SGE) or did you request an interactive session with something like qlogin?

MeganeNoyer commented 3 years ago

I'm actually on a shared cluster in the lab with Slurm as the job management system. I have just tried to create the index by running a batch job and for the moment it is running. I'll get back to you as soon as I have the result. Thanks again for your help.

MeganeNoyer commented 3 years ago

Hi ! It was indeed a problem with time and resource usage limitations. I was able to get my index and continue the analysis! Thanks again for your speed! Megane.

kbseah commented 3 years ago

You're welcome. Glad that it worked out!