geronimp / graftM

GraftM - Rapid community profiles from metagenomes
http://geronimp.github.io/graftM/
GNU General Public License v3.0
44 stars 16 forks source link

graftM create failed to root the tree #266

Closed x86lu closed 3 years ago

x86lu commented 4 years ago

I am trying to build my own rpoB gene graftM package. However, graftM create always ran into the error of rooting the tree, as shown below:

WP_133593821.1 'saggregans' with multiple parents gThermosphaera and gPermianibacter WP_168719185.1 'smarina' with multiple parents gPersephonella and g__Thermosulfurimonas WP_087861507.1 'sfermentans' with multiple parents gDyadobacter and gBrevefilum 05/14/2020 08:45:12 PM INFO: Creating reference package 05/14/2020 08:45:12 PM INFO: Attempting to run taxit create with rerooting capabilities 05/14/2020 08:45:14 PM ERROR: taxit create failed to run in a small amount of time suggesting that rerooting was unsuccessful. Unfortunately this tree will need to be rerooted manually yourself using a tree editor such as ARB or FigTree. Once you have a rerooted newick format tree, rerun graftm create specifying the new tree with --rerooted_tree. The tree file to be rerooted is 'graftm_create_tree.rpoB_80_Final.tree' 05/14/2020 08:45:14 PM ERROR: When rerunning, please use the following flags for the command line to account for the fact that some sequences may have been removed during the deduplication process.

graftM create --taxtastic_taxonomy graftm_create_taxonomy.rpoB_80_Final.csv --taxtastic_seqinfo graftm_create_seqinfo.rpoB_80_Final.csv --alignment graftm_create_alignment.rpoB_80_Final.faa --rerooted_tree

(plus other relevant arguments).

Any idea of how to solve this issue? Thanks!

geronimp commented 4 years ago

Hey x86lu,

Thanks for using graftM! When graftM tries to reroot a tree, it allows a certain amount of time after which it times out (60 seconds). This is likely because the software we use (taxit create) didn't find a sensible place to reroot the tree, and you may have to manually reroot it. This can also happen when incorrect non-homologs sequences were included in the input sequences so I'd suggest you check for that as well. In our experience when you allow it to conitnue to run, it keeps going indefinitely hence the introduction of a timeout. I would suggest manually rerooting the tree using software such as ARB or FigTree, and rerunning graftm with the suggested command in provides.

Thanks, Joel

btolar1 commented 3 years ago

I am actually having the same issue, and using a pre-constructed phylogenetic tree. What exactly does graftM require when "rerooting" a tree? I've tried manually defining an outgroup, and using the "reroot" button in FigTree with no change even when resubmitting as suggested with --rerooted_tree. Is there a minimum number of sequences required? Do you have to include a VERY different sequence (another organism entirely) for it to work?

04/07/2021 02:46:25 PM WARNING: Program found with multiple commands: taxit FastTreeMP hmmalign mafft. Arbitrarily selecting FastTreeMP 04/07/2021 02:46:25 PM INFO: Building gpkg for Thaum_amoB.gpkg 04/07/2021 02:46:25 PM INFO: Building seqinfo and taxonomy file from input taxonomy 04/07/2021 02:46:25 PM INFO: Checking for duplicate sequence names 04/07/2021 02:46:25 PM INFO: Building HMM from alignment 04/07/2021 02:46:25 PM INFO: Filtered 0 short sequences from the alignment 04/07/2021 02:46:25 PM INFO: 25 sequences remaining 04/07/2021 02:46:25 PM INFO: Checking for incorrect or fragmented reads 04/07/2021 02:46:29 PM INFO: Building HMM from alignment 04/07/2021 02:46:29 PM INFO: Filtered 0 short sequences from the alignment 04/07/2021 02:46:29 PM INFO: 25 sequences remaining 04/07/2021 02:46:29 PM INFO: Deduplicating sequences 04/07/2021 02:46:29 PM INFO: Removed 3 sequences as duplicates, leaving 22 non-identical sequences 04/07/2021 02:46:29 PM INFO: Using input unrooted tree 04/07/2021 02:46:29 PM INFO: Removing duplicates from tree 04/07/2021 02:46:29 PM INFO: Generating log file 04/07/2021 02:46:35 PM INFO: Building seqinfo and taxonomy file from input taxonomy 04/07/2021 02:46:35 PM INFO: Creating reference package 04/07/2021 02:46:35 PM INFO: Attempting to run taxit create with rerooting capabilities 04/07/2021 02:46:36 PM ERROR: taxit create failed to run in a small amount of time suggesting that rerooting was unsuccessful. Unfortunately this tree will need to be rerooted manually yourself using a tree editor such as ARB or FigTree. Once you have a rerooted newick format tree, rerun graftm create specifying the new tree with --rerooted_tree. The tree file to be rerooted is 'graftm_create_tree.Thaum_amoB.tree' 04/07/2021 02:46:36 PM ERROR: When rerunning, please use the following flags for the command line to account for the fact that some sequences may have been removed during the deduplication process.

graftM create --taxtastic_taxonomy graftm_create_taxonomy.Thaum_amoB.csv --taxtastic_seqinfo graftm_create_seqinfo.Thaum_amoB.csv --alignment graftm_create_alignment.Thaum_amoB.faa --rerooted_tree

(plus other relevant arguments).

wwood commented 3 years ago

Hi @btolar1 this definitely seems to be a bug, because of this line:

04/07/2021 02:46:29 PM INFO: Using input unrooted tree

It should be using your input rooted tree,

Are you able to send the input files and command used, so we can get to the bottom of it please? Thanks.

btolar1 commented 3 years ago

Hi @wwood - sure, here's the command and input files (I think they are small enough they should attach okay). I started with a minimal gene set to test the process; eventually I'd hope to expand this to a lot more gene representatives but thought it should work with a decent diversity to start. I believe the log file was captured enough in what I pasted above but let me know if the full one would be helpful. graftM_amoB.zip

graftM create --sequences Thaum_amoB.fasta --taxonomy Thaum_amoB_taxonomy.txt --alignment Thaum_amoB_Alignment.fasta --tree Thaum_amoB_tree.newick --log amoB.log

wwood commented 3 years ago

Hmm, do you get the same error when you use --rerooted_tree instead of --tree as you have in your command? When doing so, I don't observe the issue - i.e. it doesn't try to reroot the tree.

btolar1 commented 3 years ago

Okay that seems to have worked - so will use "--rerooted_tree" when I supply my own from now on!

I did get a different error upon resubmission after the "Creating reference package" line that referenced a missing package (psycopg2-binary) and required the installation of three dependencies in addition. Is this an additional required package (related to taxtastic in the error message) or something I missed in an update to my Conda environment?

But yes, thanks so much, that seems to have fixed the error and I now have a ".gpkg" folder!

wwood commented 3 years ago

Good to hear. I actually had trouble along the same lines with conda - been a while since I setup a new environment. Which 3 packages did you install? The bioconda recipes should be fixed - the taxtastic one is of I think.

Thanks.

btolar1 commented 3 years ago

The three it required (didn't auto-install because, pip) were: fastalite, jinja2, and PyYAML (all easily available with "pip install" fortunately). Thanks for all your help!

wwood commented 3 years ago

OK, good to know. Thanks.