Open ZhengXiaoxuan11542 opened 10 months ago
Can you run GTDB-Tk under METABOLIC conda env to see if there are some problems? From the error message, it seems that GTDB-Tk breaks due to the memory shortage
Can you run GTDB-Tk under METABOLIC conda env to see if there are some problems? From the error message, it seems that GTDB-Tk breaks due to the memory shortage
Yes, as you mentioned, this is due to insufficient memory within GTDB-Tk, causing an interruption during the execution of pplacer. However, even though I specified --pplacer_cpus --scratch_dir
in the code, I still encountered errors. I attempted to run this code with the GTDB-Tk/v207 database outside of METABOLIC env, and it ran smoothly, generating the expected content.
(METABOLIC_v4.0) [xiaoxuan@tc6001 ~]$ gtdbtk classify_wf --cpus 1 -x fasta --genome_dir /public/home/xiaoxuan/bxdata/10.bin/single/b73l-1/BIN_REFINEMENT/metawrap_50_10_bins --skip_ani_screen --pplacer_cpus 1 --scratch_dir /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/pplacer --out_dir /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/intermediate_files/gtdbtk_Genome_files [2024-01-03 18:59:39] INFO: GTDB-Tk v2.3.2 [2024-01-03 18:59:39] INFO: gtdbtk classify_wf --cpus 1 -x fasta --genome_dir /public/home/xiaoxuan/bxdata/10.bin/single/b73l-1/BIN_REFINEMENT/metawrap_50_10_bins --skip_ani_screen --pplacer_cpus 1 --scratch_dir /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/pplacer --out_dir /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/intermediate_files/gtdbtk_Genome_files [2024-01-03 18:59:39] INFO: Using GTDB-Tk reference data version r207: /public/home/xiaoxuan/database/gtdbtk/release207/ [2024-01-03 18:59:39] INFO: Identifying markers in 4 genomes with 1 threads. [2024-01-03 18:59:39] TASK: Running Prodigal V2.6.3 to identify genes. [2024-01-03 18:59:39] INFO: Completed 4 genomes in 0.02 seconds (189.02 genomes/second). [2024-01-03 18:59:39] WARNING: Prodigal skipped 4 genomes due to pre-existing data, see warnings.log [2024-01-03 18:59:39] TASK: Identifying TIGRFAM protein families. [2024-01-03 18:59:39] INFO: Completed 4 genomes in 0.00 seconds (878.76 genomes/second). [2024-01-03 18:59:39] WARNING: TIGRFAM skipped 4 genomes due to pre-existing data, see warnings.log [2024-01-03 18:59:39] TASK: Identifying Pfam protein families. [2024-01-03 18:59:39] INFO: Completed 4 genomes in 0.00 seconds (1,543.16 genomes/second). [2024-01-03 18:59:39] WARNING: Pfam skipped 4 genomes due to pre-existing data, see warnings.log [2024-01-03 18:59:39] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2024-01-03 18:59:39] TASK: Summarising identified marker genes. [2024-01-03 18:59:39] INFO: Completed 4 genomes in 0.07 seconds (55.59 genomes/second). [2024-01-03 18:59:39] INFO: Done. [2024-01-03 18:59:39] INFO: Aligning markers in 4 genomes with 1 CPUs. [2024-01-03 18:59:39] INFO: Processing 4 genomes identified as bacterial. [2024-01-03 18:59:48] INFO: Read concatenated alignment for 62,291 GTDB genomes. [2024-01-03 18:59:48] TASK: Generating concatenated alignment for each marker. [2024-01-03 18:59:48] INFO: Completed 4 genomes in 0.04 seconds (111.86 genomes/second). [2024-01-03 18:59:48] TASK: Aligning 108 identified markers using hmmalign 3.1b2 (February 2015). [2024-01-03 19:00:08] INFO: Completed 108 markers in 19.90 seconds (5.43 markers/second). [2024-01-03 19:00:08] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2024-01-03 19:01:44] INFO: Completed 62,295 sequences in 1.59 minutes (39,166.12 sequences/minute). [2024-01-03 19:01:44] INFO: Masked bacterial alignment from 41,084 to 5,036 AAs. [2024-01-03 19:01:44] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2024-01-03 19:01:44] INFO: Creating concatenated alignment for 62,295 bacterial GTDB and user genomes. [2024-01-03 19:02:04] INFO: Creating concatenated alignment for 4 bacterial user genomes. [2024-01-03 19:02:05] INFO: Done. [2024-01-03 19:02:05] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2024-01-03 19:02:05] TASK: Placing 4 bacterial genomes into backbone reference tree with pplacer using 1 CPUs (be patient). [2024-01-03 19:02:05] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 ==> Step 1 of 9: Starting pplacer.Uncaught exception: Sys_error("/public/home/xiaoxuan/database/gtdbtk/release207/split/backbone/pplacer/gtdbtk_package_backbone.refpkg: No such file or directory") Fatal error: exception Sys_error("/public/home/xiaoxuan/database/gtdbtk/release207/split/backbone/pplacer/gtdbtk_package_backbone.refpkg: No such file or directory") ==> Running pplacer v1.1.alpha19-0-g807f6f3 analysis on /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/intermediate_files/gtdbtk_Genome_files/align/gtdbtk.bac120.user_msa.fasta.gz....Process Process-10: Traceback (most recent call last): File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 124, in _worker raise PplacerException('An error was encountered while ' gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer, check the log file: /public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/intermediate_files/gtdbtk_Genome_files/classify/intermediate_results/pplacer/pplacer.backbone.bac120.out [2024-01-03 19:02:06] ERROR: Uncontrolled exit resulting from an unexpected error.
================================================================================ EXCEPTION: FileNotFoundError MESSAGE: [Errno 2] No such file or directory: '/public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/pplacer/gtdbtk.pplacer.scratch'
Traceback (most recent call last): File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 92, in run raise PplacerException( gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/main.py", line 102, in main gt_parser.parse_options(args) File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/main.py", line 1186, in parse_options self.classify(options,all_classified_ani= all_classified_ani) File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/main.py", line 587, in classify reports = classify.run(genomes=genomes, File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/classify.py", line 564, in run high_classify_tree = self.place_genomes(user_msa_file, File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/classify.py", line 270, in place_genomes pplacer.run(self.pplacer_cpus, 'wag', pplacer_ref_pkg, pplacer_json_out, File "/public/home/xiaoxuan/miniconda3/envs/METABOLIC_v4.0/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 100, in run os.remove(mmap_file) FileNotFoundError: [Errno 2] No such file or directory: '/public/home/xiaoxuan/bxdata/10.bin_gene/b73l-1_matabolic_c/pplacer/gtdbtk.pplacer.scratch
Hello, thank you for operating such a great tool. Everything goes well when I run METABOLIC-C.pl until I encounter an error in the gtdbtk
This results in all three PDF figures under the path ~/b73l-1_matabolic_c/METABOLIC_Figures/ being blank (there might be other issues I haven't noticed). I specified the database version as 207_v2; does the error have any connection with it? Additionally, I used the 'gtdbtk check_install --db_version 207' command to check the database, and there were no issues. I have attached my log file for your reference. I hope to receive your assistance in resolving this issue! METABOLIC_log.log