Closed soungalo closed 1 month ago
Ah tricky! It seems like it may be a problem with how ete3 is handling quoted internal node names. I'll open an issue over there for clarification.
In the meantime, a hack that works is to strip 'unusual' characters out of the internal node labels - there are many characters, such as colons and parentheses, which are legal in names in quoted newick, but newick parsers often don't handle them well. See: https://github.com/OpenTreeOfLife/ot-base/issues/10
You can pull a script to replace these characters with '_' from: https://github.com/OpenTreeOfLife/python-opentree/blob/itol_annot/examples/standardize_labels.py
pip install opentree
pip install dendropy
python standardize_labels.py -i subtree-ottol-801601-Vertebrata.tre -o vertebrata_standardized.tre
This replaces the label Malacothrix (genus in Opisthokonta) ott600707
with Malacothrix _genus in Opisthokonta_ ott600707
. The reason for this seemingly silly label is that there is plant genus Malacothrix as well! https://en.wikipedia.org/wiki/Malacothrix_(plant)
Ete3 does read that output tree fine. Hope that helps!
I am trying to parse the Newick file for Vertebrata downloaded from the Open Tree of Life server using the ete3 python package:
and getting the following error:
I've seen this mentioned in this old Github issue, but this does not really resolve the problem.
Any idea why this is happening and how it could be resolved?
Thanks!