Closed MeganeNoyer closed 3 years ago
hello Megane,
It looks like the vsearch job is being killed during the clustering step. This might be because the job is consuming too much memory, or exceeding a time limit. Could you try again with fewer processors, i.e. --CPUs 8
or --CPUs 4
?
Thanks for your quick reply ! I have tested with 8 and 4 but I still get the same error message either with the phyloflash command or with that of vsearch... I don't understand why...
The machine you are using should have more than enough RAM to finish the database build step. I suspect that your system admin may have placed some time or resource usage limitations. Are you running this on a shared computing cluster that uses some kind of job management system like Slurm or SGE? If so, are you submitting this as a queued batch job (e.g. with qsub
on SGE) or did you request an interactive session with something like qlogin
?
I'm actually on a shared cluster in the lab with Slurm as the job management system. I have just tried to create the index by running a batch job and for the moment it is running. I'll get back to you as soon as I have the result. Thanks again for your help.
Hi ! It was indeed a problem with time and resource usage limitations. I was able to get my index and continue the analysis! Thanks again for your speed! Megane.
You're welcome. Glad that it worked out!
Hi, I am trying to create my index with phyloflash but error messages appear when clustering. I could see that other people had this problem but I could not fix it by looking at the advice given for this problem.
At first, I tried with the basic command line:
phyloFlash_makedb.pl --remote [09:43:58] Checking for required tools. [09:43:58] Using bbduk found at [09:43:58] All required tools found. This is phyloFlash_makedb.pl from phyloFlash.pl v3.4 [09:43:58] downloading latest univec from ncbi [09:43:58] Connecting to ftp.ncbi.nlm.nih.gov [09:43:59] Finding /pub/UniVec/UniVec [09:44:00] Found UniVec (1701925 bytes) [09:44:03] downloading latest SSU RefNR from www.arb-silva.de [09:44:03] Connecting to ftp.arb-silva.de [09:44:03] Finding /current/Exports/*_SSURef_N?99_tax_silva_trunc.fasta.gz [09:44:04] Found SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz (195410064 bytes) [09:44:05] The file you are about to download comes with a license: [09:44:05] Do you wish to continue downloading under the conditions
[09:44:23] Verifying MD5... [09:44:24] File ok [09:44:24] unpacking SILVA database [09:44:38] searching for LSU contamination in SSU RefNR [09:44:38] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom bac --threads 64 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.bac.gff 2>tmp.barrnap_hits.bac.barrnap.out [09:51:27] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom arch --threads 64 --evalue 1e-10 --gene lsu --reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.arch.gff 2>tmp.barrnap_hits.arch.barrnap.out [09:58:46] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom euk --threads 64 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.euk.gff 2>tmp.barrnap_hits.euk.barrnap.out [10:08:19] Removing sequences with potential LSU contamination [10:08:19] Number of sequences to skip: 120 [10:08:25] masking low entropy regions in SSU RefNR [10:08:25] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh overwrite=t -Xmx10g threads=64 in=./138.1//SILVA_SSU.noLSU.fasta out=./138.1//SILVA_SSU.noLSU.masked.fasta minkr=4 maxkr=8 mr=t minlen=20 minke=4 maxke=8 fastawrap=0 2>tmp.bbmask_mask_repeats.log [10:12:27] removing UniVec contamination in SSU RefNR [10:12:27] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbduk.sh ref=UniVec overwrite=t -Xmx10g threads=64 fastawrap=0 ktrim=row=t minlength=800 mink=11 hdist=1 in=./138.1//SILVA_SSU.noLSU.masked.fasta out=./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta stats=./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta.UniVec_contamination_stats.txt 2>tmp.bbduk_remove_univec.log [10:14:25] Vsearch v2.5.0+ found, will index database to UDB file [10:14:25] Indexing ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta to make UDB file ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb with Vsearch [10:14:25] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --threads 64 --notrunclabels --makeudb_usearch ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta –output ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb 2>tmp.vsearch_make_udb.log [10:20:03] clustering database [10:20:03] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta –id 0.99 –centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 64 vsearch v2.17.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 13%[10:52:51] FATAL: Tool execution failed!. Error was 'No such file or directory' and return code '9' Aborting. [10:52:51] Saving log to file phyloFlash_log_on_error
_Then I tried removing the file at 0 : SILVA_SSU.noLSU.masked.trimmed.NR99.fasta, and allowing SILVA_SSU.noLSU.masked.trimmed.udb to be read._
phyloFlash_makedb.pl --silva_file SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz --univec_file UniVec --CPUs 16 -nooverwrite [13:58:43] Checking for required tools. [13:58:44] Using bowtiebuild found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bowtie-build". [13:58:44] Using bbduk found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbduk.sh". [13:58:44] Using bbmask found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh". [13:58:44] Using grep found at "/usr/bin/grep". [13:58:44] Using vsearch found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch". [13:58:44] Using barrnapHGV found at "/zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV". [13:58:44] Using bbmap found at "/softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmap.sh". [13:58:44] All required tools found. This is phyloFlash_makedb.pl from phyloFlash.pl v3.4 [13:58:44] using local copy of univec: UniVec [13:58:44] using local copy of Silva SSU RefNR: SILVA_138.1_SSURef_NR99_tax_silva_trunc.fasta.gz [13:58:44] unpacking SILVA database [13:58:59] searching for LSU contamination in SSU RefNR [13:58:59] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom bac --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.bac.gff 2>tmp.barrnap_hits.bac.barrnap.out [14:01:25] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom arch --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.arch.gff 2>tmp.barrnap_hits.arch.barrnap.out [14:04:02] running subcommand: /zfs/softs/contrib/apps/phyloflash/3.4/barrnap-HGV/bin/barrnap_HGV --kingdom euk --threads 16 --evalue 1e-10 --gene lsu –reject 0.01 ./138.1/SILVA_SSU.fasta >tmp.barrnap_hits.euk.gff 2>tmp.barrnap_hits.euk.barrnap.out [14:07:21] Removing sequences with potential LSU contamination [14:07:21] Number of sequences to skip: 120 [14:07:27] masking low entropy regions in SSU RefNR [14:07:27] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/bbmask.sh overwrite=t -Xmx10g threads=16 in=./138.1//SILVA_SSU.noLSU.fasta out=./138.1//SILVA_SSU.noLSU.masked.fasta minkr=4 maxkr=8 mr=t minlen=20 minke=4 maxke=8 fastawrap=0 2>tmp.bbmask_mask_repeats.log [14:09:07] removing UniVec contamination in SSU RefNR [14:09:07] File ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta exists, not overwriting [14:09:07] Vsearch v2.5.0+ found, will index database to UDB file [14:09:07] Indexing ./138.1//SILVA_SSU.noLSU.masked.trimmed.fasta to make UDB file ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb with Vsearch [14:09:07] WARNING: File ./138.1//SILVA_SSU.noLSU.masked.trimmed.udb already exists. Not overwriting [14:09:07] clustering database [14:09:07] running subcommand: /softs/contrib/apps/anaconda/2/envs/phyloflash-3.4/bin/vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta –id 0.99 –centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 16 vsearch v2.17.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 37%[14:40:28] FATAL: Tool execution failed!. Error was 'No such file or directory' and return code '9' Aborting. [14:40:28] Saving log to file phyloFlash_log_on_error
Finally, I tried removing the file at 0 again and running directly with Vsearch:
vsearch --cluster_fast ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta --id 0.99 --centroids ./138.1/SILVA_SSU.noLSU.masked.trimmed.NR99.fasta --notrunclabels --threads 10 vsearch v2.7.0_linux_x86_64, 188.2GB RAM, 64 cores https://github.com/torognes/vsearch Reading file ./138.1/SILVA_SSU.noLSU.masked.trimmed.fasta 100% 743189445 nt in 510215 seqs, min 800, max 3706, avg 1457 Masking 100% Sorting by length 100% Counting k-mers 100% Clustering 45%Killed
Can you help me solve this problem so that I can create my index? Thanks in advance
Megane