WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

Error in DRAM-setup.py with snakemake #223

Closed SilasK closed 1 year ago

SilasK commented 1 year ago

I run:

I run:

        DRAM-setup.py prepare_databases 
         --output_dir {output.dbdir} 
         --threads {threads} 
         --verbose 
         --skip_uniref 

and get the error.

No such file or directory: refseq_viral.20220909.mmsdb_h

2:04:29.500055: DRAM databases and forms downloaded
2:04:29.591560: Files moved to final destination
Traceback (most recent call last):
  File "/Software/anaconda/Anaconda3-ATLAS/ATLAS_DATABASES/conda_envs/cd76a66b0ae2df5931da816021adc616/bin/DRAM-setup.py", line 158, in <module>
    args.func(**args_dict)
  File "/Software/anaconda/Anaconda3-ATLAS/ATLAS_DATABASES/conda_envs/cd76a66b0ae2df5931da816021adc616/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 374, in prepare_databases
    db_handler.populate_description_db(output_dbs['description_db_loc'], update_config=False)
  File "/Software/anaconda/Anaconda3-ATLAS/ATLAS_DATABASES/conda_envs/cd76a66b0ae2df5931da816021adc616/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 244, in populate_description_db
    self.add_descriptions_to_database(self.make_header_dict_from_mmseqs_db(self.db_locs['viral']),
  File "/Software/anaconda/Anaconda3-ATLAS/ATLAS_DATABASES/conda_envs/cd76a66b0ae2df5931da816021adc616/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 155, in make_header_dict_from_mmseqs_db
    mmseqs_headers_handle = open('%s_h' % mmseqs_db, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/Software/anaconda/Anaconda3-ATLAS/ATLAS_DATABASES/Dram/refseq_viral.20220909.mmsdb_h'

Actualy I'm not yet interested in viruses, but I would like the dbs to download.

rmFlynn commented 1 year ago

Odd, Sorry I missed this for so long. Any chance that you could test the RC by installing in git?

rmFlynn commented 1 year ago

It could be a server problem, I will take a look, is this part of the atlas pipeline?

SilasK commented 1 year ago

Yes I run Dram in atlas.

gaferguz commented 1 year ago

Hello there! I'm encountering the same type of problems when setting up databases in current version 1.3.5 which i recently reinstalled after deleting all previous conda environments and database folders. I get the following message after running

$DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref --threads 8 --verbose

1:19:43.883699: DRAM databases and forms downloaded
1:19:43.949896: Files moved to final destination
/home/bioinformatica/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path /home/bioinformatica/Desktop/DRAM/DRAM_data/description_db.sqlite
  warnings.warn('Database does not exist at path %s' % self.description_loc)
Traceback (most recent call last):
  File "/home/bioinformatica/anaconda3/envs/DRAM/bin/DRAM-setup.py", line 158, in <module>
    args.func(**args_dict)
  File "/home/bioinformatica/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 374, in prepare_databases
    db_handler.populate_description_db(output_dbs['description_db_loc'], update_config=False)
  File "/home/bioinformatica/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 235, in populate_description_db
    self.add_descriptions_to_database(self.make_header_dict_from_mmseqs_db(self.db_locs['uniref']) ,
  File "/home/bioinformatica/anaconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py", line 155, in make_header_dict_from_mmseqs_db
    mmseqs_headers_handle = open('%s_h' % mmseqs_db, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/bioinformatica/Desktop/DRAM/DRAM_data/uniref90.20220901.mmsdb_h'

After succesfully downloading all required databases, the script interrupts and the error log shows description_db.sqlite undetected, but it is actually in such location.

(DRAM) bioinformatica@bioinformatica-MOOVE3-14:~/Desktop/DRAM$ ls -lh /home/bioinformatica/Desktop/DRAM/DRAM_data/description_db.sqlite 
-rw-r--r-- 1 bioinformatica bioinformatica 88K oct 13 11:31 /home/bioinformatica/Desktop/DRAM/DRAM_data/description_db.sqlite

I've also tried to download all databases including uniRef90, but description_db.sqlite error seems to persist. This is my config file after setup attempt:

Processed search databases KEGG db: None KOfam db: /home/bioinformatica/Desktop/DRAM/DRAM_data/kofam_profiles.hmm KOfam KO list: /home/bioinformatica/Desktop/DRAM/DRAM_data/kofam_ko_list.tsv UniRef db: /home/bioinformatica/Desktop/DRAM/DRAM_data/uniref90.20220901.mmsdb Pfam db: /home/bioinformatica/Desktop/DRAM/DRAM_data/pfam.mmspro dbCAN db: /home/bioinformatica/Desktop/DRAM/DRAM_data/dbCAN-HMMdb-V10.txt RefSeq Viral db: /home/bioinformatica/Desktop/DRAM/DRAM_data/refseq_viral.20220902.mmsdb MEROPS peptidase db: /home/bioinformatica/Desktop/DRAM/DRAM_data/peptidases.20220902.mmsdb VOGDB db: /home/bioinformatica/Desktop/DRAM/DRAM_data/vog_latest_hmms.txt

Descriptions of search database entries Pfam hmm dat: /home/bioinformatica/Desktop/DRAM/DRAM_data/Pfam-A.hmm.dat.gz dbCAN family activities: /home/bioinformatica/Desktop/DRAM/DRAM_data/CAZyDB.07292021.fam-activities.txt VOG annotations: /home/bioinformatica/Desktop/DRAM/DRAM_data/vog_annotations_latest.tsv.gz

Description db: /home/bioinformatica/Desktop/DRAM/DRAM_data/description_db.sqlite

I've also tried the new pre-realased 1.4.0rc3, which seems to be working fine and completes the setup step, but it also throws a warning stating description_db.sqlite is not found.

rmFlynn commented 1 year ago

@SilasK This may be solved once the new release is out, but I will need to look at Atlas to find out what is going on. DRAM does not download this file, it produces it. So if it is missing, something has gone wrong.

@gaferguz The first problem is a known problem. If you set up dram with Uniref and then try to set up without it, the old pointer is retained you can fix this by exporting and editing the config see issue:152, or import the default config as in issue:171 For the second issue pleas, first try to run DRAM-setup.py --update_description_db, if this dose not work we will need to look into something blocking SQL specifically.