Closed MonicaSteffi closed 1 year ago
Hi @MonicaSteffi
There are things that could be causing this:
Also, geNomad saves a log of MMseqs2 execution at lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/mmseqs2.log
. Can you paste it here?
Hi @apcamargo ,
Thank you for the reply. MMseqs Version: 14.7e284 genomad version is 1.4.0
- Your machine is running out of memory. In that case, increasing the number of splits should solve the issue (try 12 or 16). Are you running geNomad on a server or a personal computer? Do you know how much memory the machine has available?
I also tried with higher memory, But still the same error
and the mmseq logfile:
`createdb lim1_1_genomad_output/contigs_annotate/contigs_proteins.faa lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db
MMseqs Version: 14.7e284 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3
Converting sequences [ Time for merging to query_db_h: 0h 0m 0s 14ms Time for merging to query_db: 0h 0m 0s 13ms Database type: Aminoacid Time for processing: 0h 0m 0s 52ms search lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/genomad_db/genomad_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/search_db/search_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp --threads 56 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 8 --split-mode 0
MMseqs Version: 14.7e284 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0.2 Coverage mode 1 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 56 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 4.2 k-mer length 5 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 8 Split mode 0 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa Spaced k-mers 1 Spaced k-mer pattern Local temporary path Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.1 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Gap pseudo count 10 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 1 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner Force restart with latest tmp false Remove temporary files false
Failed to execute lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp/8896424563579662339/searchtargetprofile.sh with error 13.`
It seems that this is being caused because the filesystem where you are writing the results doesn't allow execution of scripts (see https://github.com/soedinglab/MMseqs2/issues/534). During its execution MMseqs2 generates and runs a couple of scripts, which are failing because of this limitation.
I can try to add an option to geNomad to allow MMseqs2 directory to be written in a separate location. In the meantime, can you try to write the results in a different place (e.g. your home directory)?
Hi,
I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device
, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8
. What can be tried to solve the memory issue?
Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.
Best regards,
Tatiana
Hi @deminatanja
I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8. What can be tried to solve the memory issue?
I don't think this a memory issue. No space left on device
means you don't have enough storage space. Have you checked your disk usage?
Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.
Can you send me the log?
The device storage space is over 2 Tb available, so should be enough...
Here is the log from NMDC EDGE run:
Generate WDL and inputs json
submit workflow to cromwell
Cromwell job status: Running
Cromwell job status: Failed
viral.gn
Traceback (most recent call last):
File "/opt/conda/bin/genomad", line 10, in
Here is also a full log from the HPC run:
[22:14:27] Executing genomad annotate.
Traceback (most recent call last):
File "/projappl/project_2006548/genomad/bin/genomad", line 10, in
Ok. These issues seem to be distinct.
The error you got in your HPC is most likely not memory. It is failing during the prodigal-gv execution step (which uses very little memory) while writing a file. It does seem that, for some reason, the process is being killed because you are out of storage. How big is the input (in number of sequences and average sequence length)?
There seems to be a problem with memory in NMDC Edge. I'll try to get this solved as quick as possible.
Here is some statistics about the input file:
contigs 1275686 contigs (>= 0 bp) 4189226 contigs (>= 1000 bp) 376034 contigs (>= 5000 bp) 19849 contigs (>= 10000 bp) 4814 contigs (>= 25000 bp) 584 contigs (>= 50000 bp) 115 Largest contig 189326 Total length (>= 1000 bp) 802707590 Total length (>= 5000 bp) 184943840 Total length (>= 10000 bp) 84208373 Total length (>= 25000 bp) 24221139 Total length (>= 50000 bp) 8424764 N50 1172 N75 714 L50 284205 L75 678615
Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of df -h
?
Another option is to just split your input and run geNomad in batches to avoid this sort of problem.
Hi @apcamargo Thank you. It worked when I changed the output directory
Regards Monica
No problems :)
No problems :)
How do i get the gtf or gff3 files for annotation which can be visualized using any software?
Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of
df -h
?Another option is to just split your input and run geNomad in batches to avoid this sort of problem.
The disc resources are (used/total): 351G/3.0T, 3.8M/10M files.
I was testing geNomad with a smaller input file, but ran into another error. Please see a separate issue opened here.
@MonicaSteffi You can use the script below to convert geNomad's tabular gene file to a GFF:
chmod +x convert_tabular_to_gff.py
# ./convert_tabular_to_gff.py [INPUT] [OUTPUT]
./convert_tabular_to_gff.py genomad_output/metagenome_summary/metagenome_plasmid_genes.tsv plasmid.gff
Outputting GFF files is pretty useful. I might make geNomad output GFF files by default in a future update.
Hi,
I am also getting the same error (non-zero exit status 1) since I updated to the last version of geNomad. I tried to change the directory of the output but I still get the same error. Here is the content of the log file:
[10:06:18] Executing genomad annotate.
[10:06:18] Creating the
/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annota
te directory.
[15:19:05] Proteins predicted with prodigal-gv were written to
all_predicted_viral_contigs_proteins.faa.
Traceback (most recent call last):
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 131, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'createdb', PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa'), PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db')]' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/umcg-afernandez/.conda/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs createdb /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db' failed.
Hi @asierFernandezP
Can you share the contents of /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/mmseqs2.log
? Did it work in the previous version?
Hi,
Thanks for the quick answer! I am sorry I deleted this file, but after rerunning it 4 times (without any changes) it worked. This problem only appeared after reinstalling geNomad using conda (24-02-2023) in order to get the --conservative
option, not present in the previous version. But, as I said, it worked after a few trials without any modifications. I will let you know if I see this problem again.
Good to know that you didn't have any problems again, @asierFernandezP. If you are interested in using the --conservative
flag, might be worth to take a quick look here: https://portal.nersc.gov/genomad/post_classification_filtering.html#default-parameters-and-presets
I'll close this issue for now.
Dear Developer, I am trying genomad to find taxonomy for my viral contig. Prior to genomad, I performed spades to get assembled contigs and used this as an input for genomad. But I got the following error. This might be associated with to mmseqs2.py
genomad end-to-end --min-score 0.7 --cleanup --splits 8 spade_lim1_1_old/contigs.fasta lim1_1_genomad_output genomad_db
Any help would be appreciated.