WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

There are no expected target files under the annotation folder. #307

Open 778055611 opened 11 months ago

778055611 commented 11 months ago

Hello, dear DRAM core team, thank you for developing such a perfect tool and helping me a lot, but when I use your tool to identify the AMG of my own VOTU, the target file does not exist in the output folder annotation as expected, My annotation folder only has a log file and a working_dir folder under it. My run command line is to change the input file to mine as provided in your example.

DRAM.py annotate -i ~/result/cdhit/all-sample/new.vOTU.fa -o annotation

this is my annotation dir structure

tree
.
├── annotate.log
└── working_dir
    ├── custom_dbs
    └── new.vOTU
        └── tmp
            ├── filtered_fasta.fa
            ├── gene_kegg_hits.b6
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.0
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.1
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.2
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.3
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.4
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.5
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.6
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.7
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.8
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.9
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.dbtype
            ├── gene_kegg.minbitscore60.tophit.swapped.mmsdb.index
            ├── gene_kegg.mmsdb.0
            ├── gene_kegg.mmsdb.1
            ├── gene_kegg.mmsdb.2
            ├── gene_kegg.mmsdb.3
            ├── gene_kegg.mmsdb.4
            ├── gene_kegg.mmsdb.5
            ├── gene_kegg.mmsdb.6
            ├── gene_kegg.mmsdb.7
            ├── gene_kegg.mmsdb.8
            ├── gene_kegg.mmsdb.9
            ├── gene_kegg.mmsdb.dbtype
            ├── gene_kegg.mmsdb.index
            ├── gene_kegg.tophit.minbitscore60.mmsdb.0
            ├── gene_kegg.tophit.minbitscore60.mmsdb.1
            ├── gene_kegg.tophit.minbitscore60.mmsdb.2
            ├── gene_kegg.tophit.minbitscore60.mmsdb.3
            ├── gene_kegg.tophit.minbitscore60.mmsdb.4
            ├── gene_kegg.tophit.minbitscore60.mmsdb.5
            ├── gene_kegg.tophit.minbitscore60.mmsdb.6
            ├── gene_kegg.tophit.minbitscore60.mmsdb.7
            ├── gene_kegg.tophit.minbitscore60.mmsdb.8
            ├── gene_kegg.tophit.minbitscore60.mmsdb.9
            ├── gene_kegg.tophit.minbitscore60.mmsdb.dbtype
            ├── gene_kegg.tophit.minbitscore60.mmsdb.index
            ├── gene_kegg.tophit.mmsdb.0
            ├── gene_kegg.tophit.mmsdb.1
            ├── gene_kegg.tophit.mmsdb.10
            ├── gene_kegg.tophit.mmsdb.11
            ├── gene_kegg.tophit.mmsdb.12
            ├── gene_kegg.tophit.mmsdb.13
            ├── gene_kegg.tophit.mmsdb.14
            ├── gene_kegg.tophit.mmsdb.15
            ├── gene_kegg.tophit.mmsdb.16
            ├── gene_kegg.tophit.mmsdb.17
            ├── gene_kegg.tophit.mmsdb.18
            ├── gene_kegg.tophit.mmsdb.19
            ├── gene_kegg.tophit.mmsdb.2
            ├── gene_kegg.tophit.mmsdb.20
            ├── gene_kegg.tophit.mmsdb.21
            ├── gene_kegg.tophit.mmsdb.22
            ├── gene_kegg.tophit.mmsdb.23
            ├── gene_kegg.tophit.mmsdb.24
            ├── gene_kegg.tophit.mmsdb.25
            ├── gene_kegg.tophit.mmsdb.26
            ├── gene_kegg.tophit.mmsdb.27
            ├── gene_kegg.tophit.mmsdb.28
            ├── gene_kegg.tophit.mmsdb.29
            ├── gene_kegg.tophit.mmsdb.3
            ├── gene_kegg.tophit.mmsdb.30
            ├── gene_kegg.tophit.mmsdb.31
            ├── gene_kegg.tophit.mmsdb.32
            ├── gene_kegg.tophit.mmsdb.33
            ├── gene_kegg.tophit.mmsdb.34
            ├── gene_kegg.tophit.mmsdb.35
            ├── gene_kegg.tophit.mmsdb.36
            ├── gene_kegg.tophit.mmsdb.37
            ├── gene_kegg.tophit.mmsdb.38
            ├── gene_kegg.tophit.mmsdb.39
            ├── gene_kegg.tophit.mmsdb.4
            ├── gene_kegg.tophit.mmsdb.40
            ├── gene_kegg.tophit.mmsdb.41
            ├── gene_kegg.tophit.mmsdb.42
            ├── gene_kegg.tophit.mmsdb.43
            ├── gene_kegg.tophit.mmsdb.44
            ├── gene_kegg.tophit.mmsdb.45
            ├── gene_kegg.tophit.mmsdb.46
            ├── gene_kegg.tophit.mmsdb.47
            ├── gene_kegg.tophit.mmsdb.48
            ├── gene_kegg.tophit.mmsdb.49
            ├── gene_kegg.tophit.mmsdb.5
            ├── gene_kegg.tophit.mmsdb.50
            ├── gene_kegg.tophit.mmsdb.51
            ├── gene_kegg.tophit.mmsdb.52
            ├── gene_kegg.tophit.mmsdb.53
            ├── gene_kegg.tophit.mmsdb.54
            ├── gene_kegg.tophit.mmsdb.55
            ├── gene_kegg.tophit.mmsdb.56
            ├── gene_kegg.tophit.mmsdb.57
            ├── gene_kegg.tophit.mmsdb.58
            ├── gene_kegg.tophit.mmsdb.59
            ├── gene_kegg.tophit.mmsdb.6
            ├── gene_kegg.tophit.mmsdb.60
            ├── gene_kegg.tophit.mmsdb.61
            ├── gene_kegg.tophit.mmsdb.62
            ├── gene_kegg.tophit.mmsdb.63
            ├── gene_kegg.tophit.mmsdb.64
            ├── gene_kegg.tophit.mmsdb.65
            ├── gene_kegg.tophit.mmsdb.66
            ├── gene_kegg.tophit.mmsdb.67
            ├── gene_kegg.tophit.mmsdb.68
            ├── gene_kegg.tophit.mmsdb.69
            ├── gene_kegg.tophit.mmsdb.7
            ├── gene_kegg.tophit.mmsdb.70
            ├── gene_kegg.tophit.mmsdb.71
            ├── gene_kegg.tophit.mmsdb.8
            ├── gene_kegg.tophit.mmsdb.9
            ├── gene_kegg.tophit.mmsdb.dbtype
            ├── gene_kegg.tophit.mmsdb.index
            ├── gene.mmsdb
            ├── gene.mmsdb.dbtype
            ├── gene.mmsdb_h
            ├── gene.mmsdb_h.dbtype
            ├── gene.mmsdb_h.index
            ├── gene.mmsdb.idx
            ├── gene.mmsdb.idx.dbtype
            ├── gene.mmsdb.idx.index
            ├── gene.mmsdb.index
            ├── gene.mmsdb.lookup
            ├── gene.mmsdb.source
            ├── genes.faa
            ├── genes.fna
            ├── genes.gff
            ├── kegg.filt.mmsdb
            ├── kegg.filt.mmsdb.dbtype
            ├── kegg.filt.mmsdb_h
            ├── kegg.filt.mmsdb_h.dbtype
            ├── kegg.filt.mmsdb_h.index
            ├── kegg.filt.mmsdb.index
            ├── kegg.filt.mmsdb.lookup -> /share/luoxiao2/result/DRAM/new.database/kegg.20230929.mmsdb.lookup
            ├── kegg.filt.mmsdb.source -> /share/luoxiao2/result/DRAM/new.database/kegg.20230929.mmsdb.source
            ├── kegg_gene_hits.b6
            ├── kegg_gene.mmsdb.0
            ├── kegg_gene.mmsdb.1
            ├── kegg_gene.mmsdb.2
            ├── kegg_gene.mmsdb.3
            ├── kegg_gene.mmsdb.4
            ├── kegg_gene.mmsdb.5
            ├── kegg_gene.mmsdb.6
            ├── kegg_gene.mmsdb.7
            ├── kegg_gene.mmsdb.8
            ├── kegg_gene.mmsdb.9
            ├── kegg_gene.mmsdb.dbtype
            ├── kegg_gene.mmsdb.index
            ├── kegg_gene.tophit.minbitscore350.mmsdb.0
            ├── kegg_gene.tophit.minbitscore350.mmsdb.1
            ├── kegg_gene.tophit.minbitscore350.mmsdb.2
            ├── kegg_gene.tophit.minbitscore350.mmsdb.3
            ├── kegg_gene.tophit.minbitscore350.mmsdb.4
            ├── kegg_gene.tophit.minbitscore350.mmsdb.5
            ├── kegg_gene.tophit.minbitscore350.mmsdb.6
            ├── kegg_gene.tophit.minbitscore350.mmsdb.7
            ├── kegg_gene.tophit.minbitscore350.mmsdb.8
            ├── kegg_gene.tophit.minbitscore350.mmsdb.9
            ├── kegg_gene.tophit.minbitscore350.mmsdb.dbtype
            ├── kegg_gene.tophit.minbitscore350.mmsdb.index
            ├── kegg_gene.tophit.mmsdb.0
            ├── kegg_gene.tophit.mmsdb.1
            ├── kegg_gene.tophit.mmsdb.10
            ├── kegg_gene.tophit.mmsdb.11
            ├── kegg_gene.tophit.mmsdb.12
            ├── kegg_gene.tophit.mmsdb.13
            ├── kegg_gene.tophit.mmsdb.14
            ├── kegg_gene.tophit.mmsdb.15
            ├── kegg_gene.tophit.mmsdb.16
            ├── kegg_gene.tophit.mmsdb.17
            ├── kegg_gene.tophit.mmsdb.18
            ├── kegg_gene.tophit.mmsdb.19
            ├── kegg_gene.tophit.mmsdb.2
            ├── kegg_gene.tophit.mmsdb.20
            ├── kegg_gene.tophit.mmsdb.21
            ├── kegg_gene.tophit.mmsdb.22
            ├── kegg_gene.tophit.mmsdb.23
            ├── kegg_gene.tophit.mmsdb.24
            ├── kegg_gene.tophit.mmsdb.25
            ├── kegg_gene.tophit.mmsdb.26
            ├── kegg_gene.tophit.mmsdb.27
            ├── kegg_gene.tophit.mmsdb.28
            ├── kegg_gene.tophit.mmsdb.29
            ├── kegg_gene.tophit.mmsdb.3
            ├── kegg_gene.tophit.mmsdb.30
            ├── kegg_gene.tophit.mmsdb.31
            ├── kegg_gene.tophit.mmsdb.32
            ├── kegg_gene.tophit.mmsdb.33
            ├── kegg_gene.tophit.mmsdb.34
            ├── kegg_gene.tophit.mmsdb.35
            ├── kegg_gene.tophit.mmsdb.36
            ├── kegg_gene.tophit.mmsdb.37
            ├── kegg_gene.tophit.mmsdb.38
            ├── kegg_gene.tophit.mmsdb.39
            ├── kegg_gene.tophit.mmsdb.4
            ├── kegg_gene.tophit.mmsdb.40
            ├── kegg_gene.tophit.mmsdb.41
            ├── kegg_gene.tophit.mmsdb.42
            ├── kegg_gene.tophit.mmsdb.43
            ├── kegg_gene.tophit.mmsdb.44
            ├── kegg_gene.tophit.mmsdb.45
            ├── kegg_gene.tophit.mmsdb.46
            ├── kegg_gene.tophit.mmsdb.47
            ├── kegg_gene.tophit.mmsdb.48
            ├── kegg_gene.tophit.mmsdb.49
            ├── kegg_gene.tophit.mmsdb.5
            ├── kegg_gene.tophit.mmsdb.50
            ├── kegg_gene.tophit.mmsdb.51
            ├── kegg_gene.tophit.mmsdb.52
            ├── kegg_gene.tophit.mmsdb.53
            ├── kegg_gene.tophit.mmsdb.54
            ├── kegg_gene.tophit.mmsdb.55
            ├── kegg_gene.tophit.mmsdb.56
            ├── kegg_gene.tophit.mmsdb.57
            ├── kegg_gene.tophit.mmsdb.58
            ├── kegg_gene.tophit.mmsdb.59
            ├── kegg_gene.tophit.mmsdb.6
            ├── kegg_gene.tophit.mmsdb.60
            ├── kegg_gene.tophit.mmsdb.61
            ├── kegg_gene.tophit.mmsdb.62
            ├── kegg_gene.tophit.mmsdb.63
            ├── kegg_gene.tophit.mmsdb.64
            ├── kegg_gene.tophit.mmsdb.65
            ├── kegg_gene.tophit.mmsdb.66
            ├── kegg_gene.tophit.mmsdb.67
            ├── kegg_gene.tophit.mmsdb.68
            ├── kegg_gene.tophit.mmsdb.69
            ├── kegg_gene.tophit.mmsdb.7
            ├── kegg_gene.tophit.mmsdb.70
            ├── kegg_gene.tophit.mmsdb.71
            ├── kegg_gene.tophit.mmsdb.8
            ├── kegg_gene.tophit.mmsdb.9
            ├── kegg_gene.tophit.mmsdb.dbtype
            ├── kegg_gene.tophit.mmsdb.index
            └── tmp
                ├── 14528798917776548369
                │   ├── blastp.sh
                │   ├── pref_0
                │   ├── pref_0.dbtype
                │   └── pref_0.index
                ├── 5708427434418611504
                │   ├── blastp.sh
                │   ├── pref_0.0
                │   ├── pref_0.1
                │   ├── pref_0.2
                │   ├── pref_0.3
                │   ├── pref_0.4
                │   ├── pref_0.5
                │   ├── pref_0.6
                │   ├── pref_0.7
                │   ├── pref_0.8
                │   ├── pref_0.9
                │   ├── pref_0.dbtype
                │   └── pref_0.index
                ├── 7442439913333053131
                │   └── createindex.sh
                └── latest -> 5708427434418611504

The following is my log file information,:

cat annotate.log
2023-10-09 10:59:24,275 - The log file is created at annotation/annotate.log.
2023-10-09 10:59:24,275 - 1 FASTAs found
2023-10-09 10:59:24,285 - Starting the Annotation of Bins with database configuration:

KEGG db:
Description_Db_Updated: Unknown, or Never
Citation: M. Kanehisa, M. Furumichi, Y. Sato, M. Ishiguro-Watanabe, and M. Tanabe, "Kegg: integrating viruses and cellular organisms," Nucleic acids research, vol. 49, no. D1, pp. D545–D551, 2021.
KOfam db:
Citation: T. Aramaki, R. Blanc-Mathieu, H. Endo, K. Ohkubo, M. Kanehisa, S. Goto, and H. Ogata, "Kofamkoala: Kegg ortholog assignment based on profile hmm and adaptive score threshold," Bioinformatics, vol. 36, no. 7, pp. 2251–2252, 2020.
KOfam KO list:
Citation: T. Aramaki, R. Blanc-Mathieu, H. Endo, K. Ohkubo, M. Kanehisa, S. Goto, and H. Ogata, "Kofamkoala: Kegg ortholog assignment based on profile hmm and adaptive score threshold," Bioinformatics, vol. 36, no. 7, pp. 2251–2252, 2020.
Pfam db:
Citation: J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. Sonnhammer, S. C. Tosatto, L. Paladin, S. Raj, L. J. Richardson et al., "Pfam: The protein families database in 2021," Nucleic acids research, vol. 49, no. D1, pp. D412–D419, 2021.
dbCAN db:
Citation: Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, "dbcan: a web resource for automated carbohydrate-active enzyme annotation," Nucleic acids research, vol. 40, no. W1, pp. W445–W451, 2012.
Version: 11
MEROPS peptidase db:
Description_Db_Updated: Unknown, or Never
Citation: N. D. Rawlings, A. J. Barrett, P. D. Thomas, X. Huang, A. Bateman, and R. D. Finn, "The merops database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the panther database," Nucleic acids research, vol. 46, no. D1, pp. D624–D632, 2018.

Pfam hmm dat:
Description_Db_Updated: Unknown, or Never
Citation: J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. Sonnhammer, S. C. Tosatto, L. Paladin, S. Raj, L. J. Richardson et al., "Pfam: The protein families database in 2021," Nucleic acids research, vol. 49, no. D1, pp. D412–D419, 2021.
dbCAN family activities:
Citation: Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, "dbcan: a web resource for automated carbohydrate-active enzyme annotation," Nucleic acids research, vol. 40, no. W1, pp. W445–W451, 2012.
Version: 11
Upload_Date: 08062022
Download Time: 10/07/2023, 21:26:58
Origin: Downloaded by DRAM
dbCAN subfamily EC numbers:
Citation: Y. Yin, X. Mao, J. Yang, X. Chen, F. Mao, and Y. Xu, "dbcan: a web resource for automated carbohydrate-active enzyme annotation," Nucleic acids research, vol. 40, no. W1, pp. W445–W451, 2012.
Version: 11
Upload_Date: 08062022
Download Time: 10/07/2023, 21:27:18
Origin: Downloaded by DRAM
VOG annotations:
Description_Db_Updated: Unknown, or Never
Citation: J. Thannesberger, H.-J. Hellinger, I. Klymiuk, M.-T. Kastner, F. J. Rieder, M. Schneider, S. Fister, T. Lion, K. Kosulin, J. Laengle et al., "Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples," The FASEB Journal, vol. 31, no. 5, pp. 1987–2000, 2017.
Version: latest
Download Time: 10/07/2023, 21:27:24
Origin: Downloaded by DRAM

Genome summary form:
Branch: master
Download Time: 10/07/2023, 21:27:30
Origin: Downloaded by DRAM
Module step form:
Branch: master
Download Time: 10/07/2023, 21:27:36
Origin: Downloaded by DRAM
ETC module database:
Branch: master
Download Time: 10/07/2023, 21:27:45
Origin: Downloaded by DRAM
Function heatmap form:
Branch: master
Download Time: 10/07/2023, 21:27:37
Origin: Downloaded by DRAM
AMG database:
Branch: master
Download Time: 10/07/2023, 21:27:43
Origin: Downloaded by DRAM
2023-10-09 10:59:24,286 - Retrieved database locations and descriptions
2023-10-09 10:59:24,286 - Annotating new.vOTU
2023-10-09 11:53:22,819 - Turning genes from prodigal to mmseqs2 db
2023-10-09 11:53:35,475 - Getting forward best hits from kegg
2023-10-10 02:54:33,878 - Getting reverse best hits from kegg
2023-10-10 03:13:21,534 - Getting descriptions of hits from kegg

or did I fail to build the library at first? My log file for building the library is as follows.

cat database_processing.log
2023-10-07 21:26:54,113 - Starting the process of downloading data
2023-10-07 21:26:54,114 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-10-07 21:26:54,114 - Database preparation started
2023-10-07 21:26:54,114 - Copying /share/luoxiao2/DRAM/database_files/Pfam-A.hmm.dat.gz to output_dir
2023-10-07 21:26:54,187 - Downloading dbcan_fam_activities
2023-10-07 21:26:54,188 - Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam-activities.txt
2023-10-07 21:26:58,417 - Downloading dbcan_subfam_ec
2023-10-07 21:26:58,418 - Downloading dbCAN sub-family encumber from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam.subfam.ec.txt
2023-10-07 21:27:18,631 - Downloading vog_annotations
2023-10-07 21:27:24,814 - Downloading genome_summary_form
2023-10-07 21:27:30,027 - Downloading module_step_form
2023-10-07 21:27:36,372 - Downloading function_heatmap_form
2023-10-07 21:27:37,901 - Downloading amg_database
2023-10-07 21:27:43,064 - Downloading etc_module_database
2023-10-07 21:27:45,535 - All raw data files were downloaded successfully
2023-10-07 21:27:45,535 - Processing kegg
2023-10-07 22:34:27,412 - KEGG database processed
2023-10-07 22:34:30,628 - Moved kegg to final destination, configuration updated
2023-10-07 22:34:30,628 - Processing kofam_hmm
2023-10-07 22:38:28,225 - KOfam database processed
2023-10-07 22:38:28,249 - Moved kofam_hmm to final destination, configuration updated
2023-10-07 22:38:28,249 - Processing kofam_ko_list
2023-10-07 22:38:28,468 - KOfam ko list processed
2023-10-07 22:38:28,525 - Moved kofam_ko_list to final destination, configuration updated
2023-10-07 22:38:28,525 - Processing uniref
2023-10-08 00:12:59,235 - UniRef database processed
2023-10-08 00:12:59,483 - Moved uniref to final destination, configuration updated
2023-10-08 00:12:59,483 - Processing pfam
2023-10-08 00:55:10,251 - PFAM database processed
2023-10-08 00:55:10,324 - Moved pfam to final destination, configuration updated
2023-10-08 00:55:10,559 - Moved pfam_hmm to final destination, configuration updated
2023-10-08 00:55:10,559 - Processing dbcan
2023-10-08 00:55:27,174 - dbCAN database processed
2023-10-08 00:55:27,206 - Moved dbcan to final destination, configuration updated
2023-10-08 00:55:27,206 - Processing viral
2023-10-08 00:55:40,784 - RefSeq viral database processed
2023-10-08 00:55:40,827 - Moved viral to final destination, configuration updated
2023-10-08 00:55:40,828 - Processing peptidase
2023-10-08 00:56:00,390 - MEROPS database processed
2023-10-08 00:56:00,404 - Moved peptidase to final destination, configuration updated
2023-10-08 00:56:00,404 - Processing vogdb
2023-10-08 00:57:38,433 - VOGdb database processed
2023-10-08 00:57:38,458 - Moved vogdb to final destination, configuration updated
2023-10-08 00:57:38,459 - Moved dbcan_fam_activities to final destination, configuration updated
2023-10-08 00:57:38,460 - Moved dbcan_subfam_ec to final destination, configuration updated
2023-10-08 00:57:38,461 - Moved vog_annotations to final destination, configuration updated
2023-10-08 00:57:38,461 - Moved genome_summary_form to final destination, configuration updated
2023-10-08 00:57:38,462 - Moved module_step_form to final destination, configuration updated
2023-10-08 00:57:38,463 - Moved function_heatmap_form to final destination, configuration updated
2023-10-08 00:57:38,464 - Moved amg_database to final destination, configuration updated
2023-10-08 00:57:38,465 - Moved etc_module_database to final destination, configuration updated
2023-10-08 00:57:38,465 - Populating the description db, this may take some time

It seems that these two files do not report error messages, but it is strange that there is no way to say that I have successfully executed these two steps. The most important thing is that there is no target file under my annotation folder that I expected. Thank you for reading again and look forward to your reply!