Open davidhwyllie opened 7 years ago
So sorry, the code got interpreted as markdown. Here it is, correctly formatted..
#!/usr/bin/env python3
# ete3 version is 3.0.0b35; running on Windows server 2012; Python 3.5.2.
import ete3
newickString="(s1:2.5e-08,582:2.18e-07,s2:2.5e-08,s3:0.000441758)14;"; # this is iqTree output;
# read it into 'standard' ete3 tree
t=ete3.Tree(newickString)
print(t) # no problems
# read it using explicit format
t=ete3.Tree(newickString, format=1)
print(t) # no problems
# sometimes it might be nice to export to phyloxml.
# -- from http://etetoolkit.org/support/ --
#Building a PhyloXML document out of ETE tree instances is not covered in the documentation, but it possible anyways.
#You were on the right track :) You just need to pass the tree structure as a newick string to the PhyloXML constructor.
#The following line worked for me in your example:
#phylo = phyloxml.PhyloxmlTree(newick=spec_tree.write())
# read it to phyloxml
project=ete3.Phyloxml() # create project.
phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString)
project.add_phylogeny(phylo)
xmlString=project.export() # non explanatory error message: write argument must be str, not bytes.
project=ete3.Phyloxml() # create project.
phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString, format=1)
project.add_phylogeny(phylo)
xmlString=project.export() # succeeds
Actually, even though the .export succeeds without error using the 'format=1' option, the output is not correct. The content of some of the tags (e.g. b'14') appears to be result of printing a python 3 'bytes' object, compatible with the error message generated on some occasions. e.g.
<phy:name>b's1'</phy:name>
as in
<phy:Phyloxml xmlns:phy="http://www.phyloxml.org/1.10/phyloxml.xsd">
<phy:phylogeny>
<phy:clade>
<phy:name>b'14'</phy:name>
<phy:branch_length>0.000000e+00</phy:branch_length>
<phy:clade branch_length_attr=b'"2.5e-08"'>
<phy:name>b's1'</phy:name>
<phy:branch_length>2.500000e-08</phy:branch_length>
</phy:clade>
<phy:clade branch_length_attr=b'"2.18e-07"'>
<phy:name>b'582'</phy:name>
<phy:branch_length>2.180000e-07</phy:branch_length>
</phy:clade>
<phy:clade branch_length_attr=b'"2.5e-08"'>
<phy:name>b's2'</phy:name>
<phy:branch_length>2.500000e-08</phy:branch_length>
</phy:clade>
<phy:clade branch_length_attr=b'"0.000441758"'>
<phy:name>b's3'</phy:name>
<phy:branch_length>4.417580e-04</phy:branch_length>
</phy:clade>
</phy:clade>
</phy:phylogeny>
</phy:Phyloxml>
If we use the 'format=0' option, as in the below code.
#!/usr/bin/env python3
# ete3 version is 3.0.0b35; running on Windows server 2012; Python 3.5.2.
import ete3
newickString="(s1:2.5e-08,582:2.18e-07,s2:2.5e-08,s3:0.000441758)14;"; # this is iqTree output;
# read it into 'standard' ete3 tree
t=ete3.Tree(newickString)
print(t) # no problems
# read it using explicit format
t=ete3.Tree(newickString, format=1)
print(t) # no problems
# sometimes it might be nice to export to phyloxml.
# -- from http://etetoolkit.org/support/ --
#Building a PhyloXML document out of ETE tree instances is not covered in the documentation, but it possible anyways.
#You were on the right track :) You just need to pass the tree structure as a newick string to the PhyloXML constructor.
#The following line worked for me in your example:
#phylo = phyloxml.PhyloxmlTree(newick=spec_tree.write())
# read it to phyloxml
project=ete3.Phyloxml() # create project.
#phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString)
#project.add_phylogeny(phylo)
#xmlString=project.export() # non explanatory error message: write argument must be str, not bytes.
project=ete3.Phyloxml() # create project.
phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString, format=0)
project.add_phylogeny(phylo)
xmlString=project.export() # succeeds
then we get traceback as below (note that the export starts, but then raises an error :
<phy:Phyloxml xmlns:phy="http://www.phyloxml.org/1.10/phyloxml.xsd">
<phy:phylogeny>
<phy:clade>
<phy:name>b''</phy:name>
<phy:branch_length>0.000000e+00</phy:branch_length>
<phy:confidence type=b'"branch_support"'>Traceback (most recent call
last):
File "ete3PhyloXMLOutputTest.py", line 34, in <module>
xmlString=project.export() # succeeds
File "C:\python352\lib\site-packages\ete3\phyloxml\__init__.py", line 65, in e
xport
return super(Phyloxml, self).export(outfile=outfile, level=level, namespaced
ef_=namespace)
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 423, in
export
self.exportChildren(outfile, level + 1, namespace_, name_)
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 432, in
exportChildren
phylogeny_.export(outfile, level, namespace_, name_='phylogeny')
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml_tree.py", line 14
8, in export
self.phyloxml_phylogeny.export(outfile=outfile, level=level, name_=name_, na
mespacedef_=namespacedef_)
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 562, in
export
self.exportChildren(outfile, level + 1, namespace_, name_)
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 595, in
exportChildren
self.clade.export(outfile, level, namespace_, name_='clade')
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 901, in
export
self.exportChildren(outfile, level + 1, namespace_, name_)
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 921, in
exportChildren
confidence_.export(outfile, level, namespace_, name_='confidence')
File "C:\python352\lib\site-packages\ete3\phyloxml\_phyloxml.py", line 3008, i
n export
outfile.write(str(self.valueOf_).encode(ExternalEncoding))
TypeError: write() argument must be str, not bytes
This error can be prevented by replacing line 3008 with the below:
## outfile.write(str(self.valueOf_).encode(ExternalEncoding))
if not type(self.valueOf_)==float:
outfile.write(str(self.valueOf_).encode(ExternalEncoding))
else:
outfile.write(str(self.valueOf_))
However, the display of other elements still has the formatting issue.
thanks for reporting this, @davidhwyllie It definitely sounds as a compatibility problem with Py3. Could you run the export command using python2 as a workaround?
Thank you.
I think that the issue is that the outfile.write() parameter is exporting to file the repr of the parameter passed, which in python 2 is a str, but which I think it python 3 is a bytes object.
# python 3
>>> x=b'ABC'
>>>print(x)
b'ABC'
## the fix is to substitute x.decode()
>>> print(x.decode())
ABC
By default 2.7 is generating a string object which renders correctly without the .decode() but in 3.x is is generating a bytes object for which the repr is b'thing' not thing.
As I understand it from studying your great code, each phyloxml element is mapped to a class, and each class has an .export() method which often calls relevant related methods e.g. .exportAttributes().
There's quite a lot of outfile.write() commands (> 100) but only a few export self.value, which I think is the issue. Is it possible that just appending a .decode method to relevant parameters would make this work in both 2.7 and 3.x?
## tested on python 2.76
# python 2.7.6
>>> x=b'ABC'
>>> print(x.decode())
ABC
>>> x='ABC'
>>> print(x.decode())
ABC
>>> x=u'ABC'
>>> print(x.decode())
ABC
what is your advice? I would really like to get this going on 3.x.
Is there an existing test suite for the phyloxml module? If so, could you tell me how to run it? I will make a fork and attempt to fix this. If not, if I write tests using unittest is this OK, or do you want a different framework?
Hi @davidhwyllie , thanks for your interest! the phyloXML tests are really basic: https://github.com/etetoolkit/ete/blob/master/ete3/test/test_xml_parsers.py#L10
The main reason is that the PhyloXML parser itself is not well supported. The parsing code was automatically generated using generateDS based on phyloXML schema. I did not have enough experience with phyloXML (and XML in general), so I could not create a proper parser providing good integration with ETEs Tree instances. Any improvement in that front is more than welcome.
Hi
I have been trying a javascript tree viewer https://phyd3.bits.vib.be/view.php?id=91162629d258a876ee994e9233b2ad87&f=xml which accepts phyloXML as input. As part of this I have been trying to generate phyloXML from ete3, as our existing visualisations are static and are produced by ete3.
I have discovered that the ete3 phyloxml project.export() method can produce an unexpected error 'write argument must be str, not bytes' depending on the way that the tree is imported.
I imagine this may be python3 specific, but have not checked. Would you be able to look at this and offer advice as to how to prevent this? I have a workaround (documented below), so there is no great rush.
Code to reproduce this issue is below.
!/usr/bin/env python3
ete3 version is 3.0.0b35; running on Windows server 2012; Python 3.5.2.
import ete3
newickString="(s1:2.5e-08,582:2.18e-07,s2:2.5e-08,s3:0.000441758)14;"; # this is iqTree output;
read it into 'standard' ete3 tree
t=ete3.Tree(newickString) print(t) # no problems
read it using explicit format
t=ete3.Tree(newickString, format=1) print(t) # no problems
sometimes it might be nice to export to phyloxml.
-- from http://etetoolkit.org/support/ --
Building a PhyloXML document out of ETE tree instances is not covered in the documentation, but it possible anyways.
You were on the right track :) You just need to pass the tree structure as a newick string to the PhyloXML constructor.
The following line worked for me in your example:
phylo = phyloxml.PhyloxmlTree(newick=spec_tree.write())
read it to phyloxml
project=ete3.Phyloxml() # create project. phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString) project.add_phylogeny(phylo) xmlString=project.export() # non explanatory error message: write argument must be str, not bytes.
project=ete3.Phyloxml() # create project. phylo=ete3.phyloxml.PhyloxmlTree(newick=newickString, format=1) project.add_phylogeny(phylo) xmlString=project.export() # succeeds