Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
464 stars 82 forks source link

de_novo_wf error #433

Closed Beb26 closed 1 year ago

Beb26 commented 2 years ago

Dear GTDB-Tk team, I am using gtdbtk de_novo_wf to analyze a set of Metagenome Assembled Genome (MAG). I want to obtain a tree with only my MAGs. I use the following command: "gtdbtk de_novo_wf --genome_dir ../MAGs_nucleotidiques/MAG_fasta/ --outgroup_taxon d__Bacteria --bacteria --out_dir de_novo_output --skip_gtdb_refs --custom_taxonomy_file ~/work/GTDBTK/classify_output/taxo_MAG --cpus 16"

During the running, I got this error message: [2022-09-14 11:23:59] INFO: Masked bacterial alignment from 41,084 to 5,036 AAs. [2022-09-14 11:23:59] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2022-09-14 11:23:59] INFO: Creating concatenated alignment for 174 bacterial user genomes. [2022-09-14 11:23:59] INFO: Done. [2022-09-14 11:23:59] INFO: Inferring FastTree (WAG, SH support values) using a maximum of 16 CPUs. [2022-09-14 11:25:40] INFO: FastTree version: precision [2022-09-14 11:25:40] INFO: Done. [2022-09-14 11:25:40] INFO: Reading GTDB taxonomy for representative genomes. [2022-09-14 11:25:41] INFO: Reading custom taxonomy file. [2022-09-14 11:25:41] INFO: Read custom taxonomy for 175 genomes. [2022-09-14 11:25:41] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2022-09-14 11:25:41] INFO: Read taxonomy for 65,878 genomes. [2022-09-14 11:25:41] INFO: Identifying genomes from the specified outgroup: d__Bacteria [2022-09-14 11:25:41] INFO: Identified 1 outgroup taxa in the tree. [2022-09-14 11:25:41] INFO: Identified 173 ingroup taxa in the tree. [2022-09-14 11:25:41] INFO: Outgroup is monophyletic. [2022-09-14 11:25:41] INFO: Rerooting tree. [2022-09-14 11:25:41] INFO: Rerooted tree written to: de_novo_output/infer/intermediate_results/gtdbtk.bac120.rooted.tree [2022-09-14 11:25:41] INFO: Done. [2022-09-14 11:25:41] INFO: Reading GTDB taxonomy for representative genomes. [2022-09-14 11:25:41] INFO: Reading custom taxonomy file. [2022-09-14 11:25:41] INFO: Read custom taxonomy for 175 genomes. [2022-09-14 11:25:41] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2022-09-14 11:25:41] INFO: Read taxonomy for 65,878 genomes. [2022-09-14 11:25:41] INFO: Reading tree. [2022-09-14 11:25:41] INFO: Removing any previous internal node labels. [2022-09-14 11:25:41] INFO: Calculating F-measure statistic for each taxa. [2022-09-14 11:25:41] INFO: Calculating taxa within each lineage. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Domain rank. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Phylum rank. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Class rank. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Order rank. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Family rank. [2022-09-14 11:25:41] INFO: Processing 1 taxa at Genus rank. [2022-09-14 11:25:41] INFO: Processing 0 taxa at Species rank. [2022-09-14 11:25:41] WARNING: There are 6 taxa with multiple placements of equal quality. [2022-09-14 11:25:41] WARNING: These were resolved by placing the label at the most terminal position. [2022-09-14 11:25:41] WARNING: Ideally, taxonomic assignment of all genomes should be established before tree decoration. [2022-09-14 11:25:41] INFO: Placing labels on tree. [2022-09-14 11:25:41] INFO: Writing out statistics for taxa. [2022-09-14 11:25:41] INFO: Writing out inferred taxonomy for each genome. [2022-09-14 11:25:41] ERROR: Uncontrolled exit resulting from an unexpected error.

EXCEPTION: IndexError MESSAGE: list index out of range


Traceback (most recent call last): File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/main.py", line 98, in main gt_parser.parse_options(args) File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/main.py", line 793, in parse_options self.decorate(options) File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/main.py", line 557, in decorate d.run(options.input_tree, File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/decorate.py", line 379, in run self._write_taxonomy(tree, out_taxonomy) File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/decorate.py", line 314, in _write_taxonomy taxa = self._leaf_taxa(leaf) File "/usr/local/bioinfo/src/Miniconda/Miniconda3/envs/gtdbtk-v2.1.1_env/lib/python3.8/site-packages/gtdbtk/decorate.py", line 295, in _leaf_taxa last_rank = ordered_taxa[-1][0:3] IndexError: list index out of range

Would you please help me to see what it the problem? Thank you very much.

Bertrand

aaronmussig commented 1 year ago

Hi Bertrand,

This would be because the outgroup has been set to the domain. Since we place genomes into a domain-level tree, it doesn't make much sense to set it as the domain, hence why the program doesn't expect it.

If you run it again with a lower rank (e.g. phylum) then it will work.

The next version will restrict the range of input to not allow domain: https://github.com/Ecogenomics/GTDBTk/pull/437

Cheers, Aaron