WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

Error when setting up databases #298

Open microbial-cookie opened 9 months ago

microbial-cookie commented 9 months ago

Hi DRAM developers,

This is amazing tools, it will help me annotate MAGs function. But I encountered some problems when I set up database, could you help me to solve it? Thanks!

2023-09-18 03:42:49,148 - Downloading module_step_form 2023-09-18 03:42:49,713 - Downloading function_heatmap_form 2023-09-18 03:42:50,180 - Downloading amg_database 2023-09-18 03:42:50,454 - Downloading etc_module_database 2023-09-18 03:42:50,683 - All raw data files were downloaded successfully 2023-09-18 03:42:50,684 - Processing kofam_hmm 2023-09-18 03:54:33,232 - KOfam database processed 2023-09-18 03:54:33,743 - Moved kofam_hmm to final destination, configuration updated 2023-09-18 03:54:33,743 - Processing kofam_ko_list 2023-09-18 03:54:33,837 - KOfam ko list processed 2023-09-18 03:54:33,843 - Moved kofam_ko_list to final destination, configuration updated 2023-09-18 03:54:33,843 - Processing pfam 2023-09-18 05:17:41,090 - PFAM database processed 2023-09-18 05:17:41,256 - Moved pfam to final destination, configuration updated 2023-09-18 05:17:41,262 - Moved pfam_hmm to final destination, configuration updated 2023-09-18 05:17:41,262 - Processing dbcan 2023-09-18 05:17:44,779 - dbCAN database processed 2023-09-18 05:17:44,787 - Moved dbcan to final destination, configuration updated 2023-09-18 05:17:44,792 - Moved dbcan_fam_activities to final destination, configuration updated 2023-09-18 05:17:44,797 - Moved dbcan_subfam_ec to final destination, configuration updated 2023-09-18 05:17:44,798 - Processing vogdb 2023-09-18 05:23:42,771 - VOGdb database processed 2023-09-18 05:23:42,868 - Moved vogdb to final destination, configuration updated 2023-09-18 05:23:42,877 - Moved vog_annotations to final destination, configuration updated 2023-09-18 05:23:42,878 - Processing viral 2023-09-18 05:23:44,537 - The subcommand ['mmseqs', 'createdb', 'DRAM_data1/database_files/viral.merged.protein.faa.gz', 'DRAM_data1/refseq_viral.20230918.mmsdb'] experienced an error: Fasta entry 117637 is invalid

Traceback (most recent call last): File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/bin/DRAM-setup.py", line 184, in args.func(**args_dict) File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 555, in prepare_databases processed_locs = process_functions[i](locs[i], output_dir, LOGGER, File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 297, in process_viral make_mmseqs_db(merged_viral_faas, refseq_viral_mmseqs_db, logger, create_index=True, threads=threads, verbose=verbose) File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 95, in make_mmseqs_db run_process(['mmseqs', 'createdb', fasta_loc, output_loc], logger, verbose=verbose) File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 71, in run_process raise subprocess.SubprocessError(f"The subcommand {' '.join(command)} experienced an error, see the log for more info.") subprocess.SubprocessError: The subcommand mmseqs createdb DRAM_data1/database_files/viral.merged.protein.faa.gz DRAM_data1/refseq_viral.20230918.mmsdb experienced an error, see

Best, Ruiwen

BioRRW commented 9 months ago

Apologies for the late reply. Can you provide sequence #117637 in the "viral.merged.protein.faa.gz" file? There have been issues with mmseqs in the past where the fasta headers are incorrectly formatted: https://github.com/soedinglab/MMseqs2/issues/446

microbial-cookie commented 9 months ago

Hi BioRRW,

I have check the file "viral.merged.protein.faa.gz"

image

what need do to solve this problem. thanks WechatIMG3495

microbial-cookie commented 9 months ago

hi I try to use command "zcat viral.merged.protein.faa.gz | grep -A 1 '^>117637$'" to check the file, it showed that gzip: viral.merged.protein.faa.gz: invalid compressed data--format violated.

image
BioRRW commented 9 months ago

Thank you for providing this information. This output, invalid compressed data--format violated hints at a problem with your viral.merged.protein.faa.gz. I suggest, as the file name hints at, re-merging the files. Make sure you do not perform a cat viral1.faa.gz viral2.faa.gz > viral.merged.faa.gz as cat needs decompressed files. I would use the command you use zcat, like you did to print out the contents of the file, or decompress the files and merge them before gzipping them again. It may be advised to see if the files you merged to create viral.merged.protein.faa.gz are valid gzipped files as well. You could do this by trying to view them with zcat or using the 'test' option in gzip: gzip -t [filename.gz].

Hope this helps and keep us posted of your progress.

microbial-cookie commented 9 months ago

Hi BioRRW, I check the my data in database, I found that there was no files named "viral.2.protein faa.gz", only have one file "viral.1.protein.faa.gz". So how do I re-merg one file? or I need to redownload the "viral.1.protein.faa.gz" and "viral.2.protein faa.gz"? what is code to redownload these files? Thank you very much!

image