WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

DRAM.py annotate stops at "Getting hits from pfam" #227

Closed alyzzabc closed 1 year ago

alyzzabc commented 1 year ago

Hi,

I have been trying to annotate my dataset, but it always stops at "Getting hits from pfam," and I get this error:

Traceback (most recent call last): File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/bin/DRAM-v.py", line 153, in args.func(**args_dict) File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_vgfs.py", line 475, in annotate_vgfs annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1013, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 921, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 829, in annotate_orfs annotation_list.append(run_mmseqs_profile_search(query_db, db_handler.db_locs['pfam'], tmp_dir, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 204, in run_mmseqs_profile_search run_process(['mmseqs', 'convertalis', query_db, pfam_profile, output_db, output_loc], verbose=verbose) File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'convertalis', 'dramv-annotate-t28/working_dir/final-viral-combined-for-dramv/tmp/gene.mmsdb', '/gxfs_work1/geomar/smomw535/db-dram/pfam.mmspro', 'dramv-annotate-t28/working_dir/final-viral-combined-for-dramv/tmp/pfam.mmsdb', 'dramv-annotate-t28/working_dir/final-viral-combined-for-dramv/tmp/pfam_output.b6']' died with <Signals.SIGABRT: 6>.

I figured it was a memory issue so I increased the memory allocation (I am trying to run this on Slurm). But the error persisted, and still stopped at "Getting hits from pfam."

I created a new conda environment of DRAM with Python v.3.10 (because I have Python v.3.8 installed and I thought maybe that was the issue). It still stopped at "Getting hits from pfam," and this was the error:

Traceback (most recent call last): File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/bin/DRAM-v.py", line 153, in args.func(**args_dict) File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_vgfs.py", line 475, in annotate_vgfs annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 1013, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 921, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 829, in annotate_orfs annotation_list.append(run_mmseqs_profile_search(query_db, db_handler.db_locs['pfam'], tmp_dir, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 201, in run_mmseqs_profile_search run_process(['mmseqs', 'search', query_db, pfam_profile, output_db, tmp_dir, '-k', '5', '-s', '7', '--threads', File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/site-packages/mag_annotator/utils.py", line 27, in run_process return subprocess.run(command, check=check, shell=shell, stdout=subprocess.PIPE, File "/gxfs_home/geomar/smomw535/miniconda3/envs/DRAM-py3.10/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'search', 'dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/gene.mmsdb', '/gxfs_work1/geomar/smomw535/db-dram/pfam.mmspro', 'dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/pfam.mmsdb', 'dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/tmp', '-k', '5', '-s', '7', '--threads', '28']' returned non-zero exit status 1.

I'm stuck and do not know what to try next.

rmFlynn commented 1 year ago

try reducing the number of threads mmseqs uses less memory when you have fewer threads. you can also just run mmseqs search dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/gene.mmsdb /gxfs_work1/geomar/smomw535/db-dram/pfam.mmspro dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/pfam.mmsdb dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/tmp -k 5 -s 7 --threads 28 see if that runs on its own.

rmFlynn commented 1 year ago

There may be a problem with the latest version of pfam I am looking into it

alyzzabc commented 1 year ago

try reducing the number of threads mmseqs uses less memory when you have fewer threads. you can also just run mmseqs search dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/gene.mmsdb /gxfs_work1/geomar/smomw535/db-dram/pfam.mmspro dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/pfam.mmsdb dramv-annotate-update-dram/working_dir/final-viral-combined-for-dramv/tmp/tmp -k 5 -s 7 --threads 28 see if that runs on its own.

I tried running mmseqs search on its own and it worked! Also redid the whole annotation step and it finished in 6 hours. Thank you for your help!

Aiswarya-prasad commented 1 year ago

I have an issue at the getting pfam hits stage as well. This is what the error says,

2022-11-14 14:38:42,424 - Logging to console
2022-11-14 14:38:48,124 - The log file is created at 05_Assembly/DRAM_annotations/D1.2/annotate.log.
2022-11-14 14:38:48,124 - 1 FASTAs found
2022-11-14 14:38:48,139 - Starting the Annotation of Bins with database configuration: 
 there are no settings, the config is corrupted or too old.
2022-11-14 14:38:48,140 - Retrieved database locations and descriptions
2022-11-14 14:38:48,140 - Annotating D1.2_scaffolds
2022-11-14 14:45:05,399 - Turning genes from prodigal to mmseqs2 db
2022-11-14 14:45:09,421 - Getting hits from kofam
2022-11-14 16:43:38,454 - Getting forward best hits from peptidase
2022-11-14 16:46:44,747 - Getting reverse best hits from peptidase
2022-11-14 16:46:56,242 - Getting descriptions of hits from peptidase
2022-11-14 16:47:03,383 - Getting hits from pfam
2022-11-14 16:47:36,929 - The subcommand ['mmseqs', 'search', '05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/gene.mmsdb', '/reference/dram/pfam.mmspro', '05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/pfam.mmsdb', '05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/tmp', '-k', '5', '-s', '7', '--threads', '8'] experienced an 
error: Score of forward/backward SW differ: 2221 2222. Q: 100 T: 44721.
Start: Q: 2, T: 2. End: Q: 168, T 273

Traceback (most recent call last):
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/bin/DRAM.py", line 207, in <module>
    args.func(**args_dict)
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 984, in annotate_bins
    all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, logger, min_contig_size, prodigal_mode, trans_table,
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 916, in annotate_fastas
    annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, logger, min_contig_size, prodigal_mode,
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 824, in annotate_fasta
    annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, logger, custom_db_locs, custom_hmm_locs,
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 731, in annotate_orfs
    annotation_list.append(run_mmseqs_profile_search(query_db, db_handler.config['search_databases']['pfam'], tmp_dir,
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/annotate_bins.py", line 146, in run_mmseqs_profile_search
    run_process(['mmseqs', 'search', query_db, pfam_profile, output_db, tmp_dir, '-k', '5', '-s', '7', '--threads',
  File "/scratch/aprasad/built-envs/358e2aba62b05c15419547f98620f6d9/lib/python3.10/site-packages/mag_annotator/utils.py", line 71, in run_process
    raise subprocess.SubprocessError(f"The subcommand {' '.join(command)} experienced an error, see the log for more info.")
subprocess.SubprocessError: The subcommand mmseqs search 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/gene.mmsdb /reference/dram/pfam.mmspro 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/pfam.mmsdb 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/tmp -k 5 -s 7 --threads 8 experienced an error, see the log for more info###
rmFlynn commented 1 year ago

I just saw this happen yesterday, I don't know why and may need to work with the Soeding lab to fix. In the meantime, you may want to try downgrading mmseqs in the Conda environment.

Running the command in the same working directory where you ran dram and see if it gives different results. mmseqs search 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/gene.mmsdb /reference/dram/pfam.mmspro 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/pfam.mmsdb 05_Assembly/DRAM_annotations/D1.2/working_dir/D1.2_scaffolds/tmp/tmp -k 5 -s 7 --threads 8

rmFlynn commented 1 year ago

I will push an upgrade once I have it figured out myself.

rmFlynn commented 1 year ago

Run this to fix for now, looking into the problem before I push conda install mmseqs2==13.45111