AnantharamanLab / METABOLIC

A scalable high-throughput metabolic and biogeochemical functional trait profiler
175 stars 44 forks source link

Install problem #115

Open LiZhihua1982 opened 1 year ago

LiZhihua1982 commented 1 year ago

(METABOLIC-v4) lizhihua@lizhihua-T640:/media/lizhihua/software/METABOLIC-v4/METABOLIC$ ./run_to_setup.sh tar (child): Accessory_scripts.tgz: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now rm: cannot remove 'Accessory_scripts.tgz': No such file or directory tar (child): METABOLIC_hmm_db.tgz: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now rm: cannot remove 'METABOLIC_hmm_db.tgz': No such file or directory tar (child): METABOLIC_template_and_database.tgz: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now rm: cannot remove 'METABOLIC_template_and_database.tgz': No such file or directory tar (child): Motif.tgz: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now rm: cannot remove 'Motif.tgz': No such file or directory mkdir: cannot create directory ‘kofam_database’: File exists

ChaoLab commented 1 year ago

Hi, Did you run this script in a cluster? It seems that some tgz files were not found in the right place (for example, Accessory_scripts.tgz, METABOLIC_hmm_db.tgz, etc). It can happen when running in clusters since some PATH settings need to curate manually. It is suggested to run each line command within run_to_setup.sh to track the outcomes step by step.

LiZhihua1982 commented 1 year ago

Hi, Thank you very much! I have tried. I think the problem is "gdown --quiet https://drive.google.com/uc?id=1JQJpw_elM4IyGo_BIfioy8XnmqgoN-Iw" which I can not download the testfile from google related website.

ChaoLab commented 1 year ago

It is a test file in Google Drive. If It is difficult for you to download, you can turn to other files (your datasets or a subset of your datasets) to make a test.

LiZhihua1982 commented 1 year ago

Thank you! if we use the gtdbtk database: https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_data.tar.gz. it will report error as below, where can't find the PF01868.17.hmm. So which version is OK? Thanks! (METABOLIC-v4) lizhihua@lizhihua-T640:/media/lizhihua/software/METABOLIC-v4/METABOLIC$ perl METABOLIC-C.pl -t 40 -m-cutoff 0.75 -in-gn Genome_files_L11 -kofam-db full -r L11_path.txt -o METABOLIC_out_L11-C [2022-11-15 14:38:31] The Prodigal annotation is running... [2022-11-15 14:42:15] The Prodigal annotation is finished [2022-11-15 14:42:16] The hmmsearch is running with 40 cpu threads... [2022-11-15 14:48:57] The hmmsearch is finished [2022-11-15 14:49:00] Generating each hmm faa collection... [2022-11-15 14:49:00] Each hmm faa collection has been made [2022-11-15 14:49:00] The KEGG module result is calculating... [2022-11-15 14:49:38] The KEGG identifier (KO id) result is calculating... [2022-11-15 14:49:39] The KEGG identifier (KO id) seaching result is finished [2022-11-15 14:49:39] Searching CAZymes by dbCAN2... [2022-11-15 15:13:21] dbCAN2 searching is done [2022-11-15 15:13:21] Searching MEROPS peptidase... [2022-11-15 15:15:11] MEROPS peptidase searching is done [2022-11-15 15:15:12] METABOLIC table has been generated [2022-11-15 15:15:12] Drawing element cycling diagrams... [2022-11-15 15:32:02] Drawing element cycling diagrams finished [2022-11-15 15:32:02] Drawing metabolic handoff diagrams... [2022-11-15 15:32:05] Drawing metabolic handoff diagrams finished [2022-11-15 15:32:05] Drawing energy flow chart... [2022-11-15 15:32:05] INFO: GTDB-Tk v1.6.0 [2022-11-15 15:32:05] INFO: gtdbtk classify_wf --cpus 40 -x fasta --genome_dir Genome_files_L11 --out_dir METABOLIC_out_L11-C/intermediate_files/gtdbtk_Genome_files [2022-11-15 15:32:05] INFO: Using GTDB-Tk reference data version r207: /media/lizhihua/software/GTDBTK_DB/release207/ [2022-11-15 15:32:05] INFO: Identifying markers in 1 genomes with 40 threads. [2022-11-15 15:32:05] TASK: Running Prodigal V2.6.3 to identify genes. [2022-11-15 15:34:16] INFO: Completed 1 genome in 2.19 minutes (2.19 minutes/genome). [2022-11-15 15:34:16] TASK: Identifying TIGRFAM protein families. [2022-11-15 15:34:33] INFO: Completed 1 genome in 16.45 seconds (16.45 seconds/genome). [2022-11-15 15:34:33] TASK: Identifying Pfam protein families. [2022-11-15 15:34:35] INFO: Completed 1 genome in 1.94 seconds (1.94 seconds/genome). [2022-11-15 15:34:35] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2022-11-15 15:34:35] TASK: Summarising identified marker genes. [2022-11-15 15:34:35] INFO: Completed 1 genome in 0.38 seconds (2.63 genomes/second). [2022-11-15 15:34:35] INFO: Done. [2022-11-15 15:34:35] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================ EXCEPTION: FileNotFoundError MESSAGE: [Errno 2] No such file or directory: '/media/lizhihua/software/GTDBTK_DB/release207/markers/pfam/individual_hmms/PF01868.17.hmm'


Traceback (most recent call last): File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/main.py", line 95, in main gt_parser.parse_options(args) File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/main.py", line 735, in parse_options self.align(options) File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/main.py", line 291, in align markers.align(options.identify_dir, File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/markers.py", line 455, in align ar122_marker_info_file = MarkerInfoFileAR122(out_dir, prefix) File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/io/marker_info.py", line 74, in init super().init(path, AR122_MARKERS) File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/io/marker_info.py", line 33, in init self.markers = self._parse_markers(markers) File "/home/lizhihua/miniconda3/envs/METABOLIC-v4/lib/python3.8/site-packages/gtdbtk/io/marker_info.py", line 43, in _parse_markers with open(marker_path) as fh: FileNotFoundError: [Errno 2] No such file or directory: '/media/lizhihua/software/GTDBTK_DB/release207/markers/pfam/individual_hmms/PF01868.17.hmm'

Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514.

ChaoLab commented 1 year ago

Hi, According to GTDB-Tk's requirement, the software version much be higher enough to match the lastest database. Have you checked on this (the last table in this page: https://ecogenomics.github.io/GTDBTk/installing/index.html)

LiZhihua1982 commented 1 year ago

Thank you! The GTDB-Tk v1.6.0 is in METABOLIC, so I will choose R202 GTDB Release according to (https://ecogenomics.github.io/GTDBTk/installing/index.html). I want to ask I update the GTDB-Tk v1.6.0 not the GTDB database, is it OK? Thank you!

ChaoLab commented 1 year ago

You can use a higher GTDB-Tk as long as it is matched with the db that you will use

LiZhihua1982 commented 1 year ago

All the procedure is OK. I can get the file "MW-score_result.txt and MW-score_result_table_input.txt", However, I can not obtain the MW-score figure as below. image. Would you help me? Thanks! (METABOLIC-v4) lizhihua@lizhihua-T640:/media/lizhihua/software/METABOLIC-v4/METABOLIC$ perl METABOLIC-C.pl -t 40 -m-cutoff 0.75 -in-gn Genome_files_L11 -kofam-db full -r L11_path.txt -o METABOLIC_out_L11-C [2022-11-16 14:28:32] The Prodigal annotation is running... [2022-11-16 14:32:19] The Prodigal annotation is finished [2022-11-16 14:32:20] The hmmsearch is running with 40 cpu threads... [2022-11-16 14:39:05] The hmmsearch is finished [2022-11-16 14:39:08] Generating each hmm faa collection... [2022-11-16 14:39:09] Each hmm faa collection has been made [2022-11-16 14:39:09] The KEGG module result is calculating... [2022-11-16 14:39:48] The KEGG identifier (KO id) result is calculating... [2022-11-16 14:39:48] The KEGG identifier (KO id) seaching result is finished [2022-11-16 14:39:48] Searching CAZymes by dbCAN2... [2022-11-16 15:04:11] dbCAN2 searching is done [2022-11-16 15:04:11] Searching MEROPS peptidase... [2022-11-16 15:06:02] MEROPS peptidase searching is done [2022-11-16 15:06:03] METABOLIC table has been generated [2022-11-16 15:06:03] Drawing element cycling diagrams... [2022-11-16 15:23:02] Drawing element cycling diagrams finished [2022-11-16 15:23:02] Drawing metabolic handoff diagrams... [2022-11-16 15:23:05] Drawing metabolic handoff diagrams finished [2022-11-16 15:23:05] Drawing energy flow chart... [2022-11-16 15:23:05] INFO: GTDB-Tk v1.6.0 [2022-11-16 15:23:05] INFO: gtdbtk classify_wf --cpus 40 -x fasta --genome_dir Genome_files_L11 --out_dir METABOLIC_out_L11-C/intermediate_files/gtdbtk_Genome_files [2022-11-16 15:23:05] INFO: Using GTDB-Tk reference data version r202: /media/lizhihua/software/GTDBTK_DB/release202/ [2022-11-16 15:23:05] INFO: Identifying markers in 1 genomes with 40 threads. [2022-11-16 15:23:05] TASK: Running Prodigal V2.6.3 to identify genes. [2022-11-16 15:25:16] INFO: Completed 1 genome in 2.19 minutes (2.19 minutes/genome). [2022-11-16 15:25:16] TASK: Identifying TIGRFAM protein families. [2022-11-16 15:25:37] INFO: Completed 1 genome in 20.51 seconds (20.51 seconds/genome). [2022-11-16 15:25:37] TASK: Identifying Pfam protein families. [2022-11-16 15:25:41] INFO: Completed 1 genome in 4.29 seconds (4.29 seconds/genome). [2022-11-16 15:25:41] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2022-11-16 15:25:41] TASK: Summarising identified marker genes. [2022-11-16 15:25:41] INFO: Completed 1 genome in 0.38 seconds (2.62 genomes/second). [2022-11-16 15:25:41] INFO: Done. [2022-11-16 15:25:41] INFO: Aligning markers in 1 genomes with 40 CPUs. [2022-11-16 15:25:42] INFO: Processing 1 genomes identified as archaeal. [2022-11-16 15:25:42] INFO: Read concatenated alignment for 2,339 GTDB genomes. [2022-11-16 15:25:42] TASK: Generating concatenated alignment for each marker. [2022-11-16 15:25:44] INFO: Completed 1 genome in 0.84 seconds (1.19 genomes/second). [2022-11-16 15:25:44] TASK: Aligning 37 identified markers using hmmalign 3.1b2 (February 2015). [2022-11-16 15:25:45] INFO: Completed 37 markers in 0.27 seconds (136.88 markers/second). [2022-11-16 15:25:45] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask. [2022-11-16 15:25:48] INFO: Completed 2,340 sequences in 3.00 seconds (780.42 sequences/second). [2022-11-16 15:25:48] INFO: Masked archaeal alignment from 32,754 to 5,124 AAs. [2022-11-16 15:25:48] INFO: 0 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-11-16 15:25:48] INFO: Creating concatenated alignment for 2,340 archaeal GTDB and user genomes. [2022-11-16 15:25:48] INFO: Creating concatenated alignment for 1 archaeal user genomes. [2022-11-16 15:25:48] INFO: Done. [2022-11-16 15:25:48] TASK: Placing 1 archaeal genomes into reference tree with pplacer using 40 CPUs (be patient). [2022-11-16 15:25:48] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2022-11-16 15:26:20] INFO: Calculating RED values based on reference tree. [2022-11-16 15:26:20] TASK: Traversing tree to determine classification method. [2022-11-16 15:26:20] INFO: Completed 1 genome in 0.00 seconds (13,066.37 genomes/second). [2022-11-16 15:26:20] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2022-11-16 15:26:24] INFO: Completed 2 comparisons in 4.15 seconds (2.07 seconds/comparison). [2022-11-16 15:26:24] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2022-11-16 15:26:24] INFO: Done. [2022-11-16 15:26:29] Drawing energy flow chart finished [2022-11-16 15:26:29] Calculating MW-score ... [2022-11-16 15:26:29] Calculating MW-score is done METABOLIC-C was done, the total running time: 00:57:57 (hh:mm:ss)

LiZhihua1982 commented 1 year ago

Hi, It is still have trouble!

[2022-11-16 17:49:35] INFO: GTDB-Tk v1.6.0 [2022-11-16 17:49:35] INFO: gtdbtk classify_wf --cpus 40 -x fasta --genome_dir Genome_files_L11 --out_dir METABOLIC_out_L11-C/intermediate_files/gtdbtk_Genome_files [2022-11-16 17:49:35] INFO: Using GTDB-Tk reference data version r202: /media/lizhihua/software/GTDBTK_DB/release202/ [2022-11-16 17:49:35] INFO: Identifying markers in 8 genomes with 40 threads. [2022-11-16 17:49:35] TASK: Running Prodigal V2.6.3 to identify genes. [2022-11-16 17:51:23] INFO: Completed 8 genomes in 1.80 minutes (4.45 genomes/minute). [2022-11-16 17:51:23] TASK: Identifying TIGRFAM protein families. [2022-11-16 17:51:32] INFO: Completed 8 genomes in 9.04 seconds (1.13 seconds/genome). [2022-11-16 17:51:32] TASK: Identifying Pfam protein families. [2022-11-16 17:51:34] INFO: Completed 8 genomes in 1.94 seconds (4.12 genomes/second). [2022-11-16 17:51:34] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2022-11-16 17:51:34] TASK: Summarising identified marker genes. [2022-11-16 17:51:35] INFO: Completed 8 genomes in 0.18 seconds (44.40 genomes/second). [2022-11-16 17:51:35] INFO: Done. [2022-11-16 17:51:35] INFO: Aligning markers in 8 genomes with 40 CPUs. [2022-11-16 17:51:35] INFO: Processing 7 genomes identified as bacterial. [2022-11-16 17:51:40] INFO: Read concatenated alignment for 45,555 GTDB genomes. [2022-11-16 17:51:40] TASK: Generating concatenated alignment for each marker. [2022-11-16 17:51:46] INFO: Completed 7 genomes in 0.02 seconds (336.56 genomes/second). [2022-11-16 17:51:46] TASK: Aligning 113 identified markers using hmmalign 3.1b2 (February 2015). [2022-11-16 17:51:53] INFO: Completed 113 markers in 0.75 seconds (151.21 markers/second). [2022-11-16 17:51:53] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2022-11-16 17:53:04] INFO: Completed 45,561 sequences in 1.18 minutes (38,481.00 sequences/minute). [2022-11-16 17:53:04] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs. [2022-11-16 17:53:04] INFO: 1 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-11-16 17:53:04] INFO: Creating concatenated alignment for 45,560 bacterial GTDB and user genomes. [2022-11-16 17:53:04] INFO: Creating concatenated alignment for 5 bacterial user genomes. [2022-11-16 17:53:04] INFO: Processing 1 genomes identified as archaeal. [2022-11-16 17:53:05] INFO: Read concatenated alignment for 2,339 GTDB genomes. [2022-11-16 17:53:05] TASK: Generating concatenated alignment for each marker. [2022-11-16 17:53:09] INFO: Completed 1 genome in 0.14 seconds (6.92 genomes/second). [2022-11-16 17:53:09] TASK: Aligning 66 identified markers using hmmalign 3.1b2 (February 2015). [2022-11-16 17:53:14] INFO: Completed 66 markers in 0.41 seconds (159.20 markers/second). [2022-11-16 17:53:15] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask. [2022-11-16 17:53:18] INFO: Completed 2,340 sequences in 3.06 seconds (764.10 sequences/second). [2022-11-16 17:53:18] INFO: Masked archaeal alignment from 32,754 to 5,124 AAs. [2022-11-16 17:53:18] INFO: 0 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-11-16 17:53:18] INFO: Creating concatenated alignment for 2,340 archaeal GTDB and user genomes. [2022-11-16 17:53:18] INFO: Creating concatenated alignment for 1 archaeal user genomes. [2022-11-16 17:53:18] INFO: Done. [2022-11-16 17:53:18] TASK: Placing 1 archaeal genomes into reference tree with pplacer using 40 CPUs (be patient). [2022-11-16 17:53:18] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2022-11-16 17:54:08] INFO: Calculating RED values based on reference tree. [2022-11-16 17:54:08] TASK: Traversing tree to determine classification method. [2022-11-16 17:54:09] INFO: Completed 1 genome in 0.00 seconds (11,881.88 genomes/second). [2022-11-16 17:54:09] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2022-11-16 17:54:10] INFO: Completed 2 comparisons in 1.29 seconds (1.55 comparisons/second). [2022-11-16 17:54:10] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2022-11-16 17:54:10] TASK: Placing 5 bacterial genomes into reference tree with pplacer using 40 CPUs (be patient). [2022-11-16 17:54:10] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2022-11-16 18:34:42] INFO: Calculating RED values based on reference tree. [2022-11-16 18:34:53] TASK: Traversing tree to determine classification method. [2022-11-16 18:34:53] INFO: Completed 5 genomes in 0.00 seconds (3,491.76 genomes/second). [2022-11-16 18:34:54] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2022-11-16 18:34:59] INFO: Completed 250 comparisons in 5.21 seconds (47.98 comparisons/second). [2022-11-16 18:34:59] INFO: 4 genome(s) have been classified using FastANI and pplacer. [2022-11-16 18:34:59] INFO: Done. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. [2022-11-16 18:35:04] Drawing energy flow chart finished [2022-11-16 18:35:04] Calculating MW-score ... Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1655. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1655. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1655. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1655. [2022-11-16 18:35:05] Calculating MW-score is done METABOLIC-C was done, the total running time: 01:17:13 (hh:mm:ss)

ChaoLab commented 1 year ago

Hi, It is hard to know in which step METABOLIC made mistakes. Does the depth file (a txt file with the name of "Allgene***") exist?

LiZhihua1982 commented 1 year ago

1512 my $cat = $Bin2Cat{$gn}[0]; 1513 my $gn_n_pth = "$gn\t$pth"; $Hash_gn_n_pth{$gn_n_pth} = 1; 1514 $Total_R_community_coverage{$gn_n_pth} = "$cat\t$pth\t$gn_cov_percentage"; $cat has been initialized, however, it report error: [2022-11-17 09:46:44] INFO: Done. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1514. 1652 my $cat = $Bin2Cat{$gn}[$tax_code]; 1653 my $gn_n_pth = "$gn\t$pth"; 1654 $MW_score_community_coverage{$gn_n_pth} = "$cat\t$pth\t$gn_cov_percentage"; The same $cat has been initialized, however, it report error: [2022-11-17 09:46:49] Drawing energy flow chart finished [2022-11-17 09:46:49] Calculating MW-score ... Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1654. Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line 1654. [2022-11-17 09:46:50] Calculating MW-score is done

LiZhihua1982 commented 1 year ago

Does the depth file (a txt file with the name of "Allgene***") exist? Yes, it exist.

LiZhihua1982 commented 1 year ago

![Uploading image.png…]() I got the "MW-score_result_table_input.txt" and "MW-score_result.txt" in the MW-score_result in the director , however, I do not know how to draw this figure? would you help me?

ChaoLab commented 1 year ago

It is simply represented by Excel. You will need to color each cell with gradient palettes

Cun-Li commented 1 year ago

I also encountered the same error ‘Use of uninitialized value $cat in concatenation (.) or string at METABOLIC-C.pl line ** ’ . In the output, only the phylum classification of bacteria, not of archaea. I guess there may be a problem when reading the file or matching the regular expressions. After I changed 'gtdbtk.ar.summary.tsv' to 'gtdbtk.ar53.summary.tsv' in the METABOLIC-C.pl script, the error did not appear again.

ChaoLab commented 1 year ago

I have updated the script to deal with the new archaeal tree markers, please see this issue: https://github.com/AnantharamanLab/METABOLIC/issues/116