Open davidhwyllie opened 7 years ago
@davidhwyllie Can you paste a traceback of the actual error? Depending on the content I'm tempted to report this as an upstream (i.e. python) problem.
Also which version of Python 3?
Hi
at the moment I am having trouble producing a unit test to reproduce this. It isn't as simple as the 'strings longer than 220 chars' described above. When I have a reproducible, failing example I will write further.
In the meantime, I have attached a unit test which passes which loads small and large files from strings and files on windows using unpatched ete3.
It produces a deprecation warning related to reading files in 'rU' mode, but this is a python 3 issue not a windows one.
Thanks David, can you implement those unitests on your github branch and the Pull Request? It is easy to review and comment that way.
On 4 March 2017 at 22:31, David Wyllie notifications@github.com wrote:
Hi
at the moment I am having trouble producing a unit test to reproduce this. It isn't as simple as the 'strings longer than 220 chars' described above. When I have a reproducible, failing example I will write further.
In the meantime, I have attached a unit test which passes which loads small and large files from strings and files on windows using unpatched ete3.
It produces a deprecation warning related to reading files in 'rU' mode, but this is a python 3 issue not a windows one.
ete3_issue258.zip https://github.com/etetoolkit/ete/files/818947/ete3_issue258.zip
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etetoolkit/ete/issues/258#issuecomment-284184154, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ_Sk7j3RuDd_S7T-v0QmLUyY4hmFRJks5ridgsgaJpZM4MEfEV .
Hi
Thanks. Sorry for the delay in replying.
This issue can be reproduced on the following system using the latest release of ete3.
System details OS: Windows-7-6.1.7601-SP1 Python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] Ete3: 3.1.1
How to reproduce A self contained unit test is attached
E:\dwyllie\GitHub\SPIRALTREE\src>python issue258.py
Running testIssue258
system: Windows-7-6.1.7601-SP1
python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017,
16:02:32) [MSC v.1900 64 bit (AMD64)]
ete3 version: 3.1.1
Loading small tree
There are 3 nodes
Loading large tree
E
======================================================================
ERROR: runTest (__main__.testIssue258)
----------------------------------------------------------------------
Traceback (most recent call last):
File "issue258.py", line 37, in runTest
largeTree=Tree(largeTreeText)
File "C:\Program Files\Python35\lib\site-packages\ete3_dev_git\ete3\coretype\t
ree.py", line 211, in __init__
quoted_names=quoted_node_names)
File "C:\Program Files\Python35\lib\site-packages\ete3_dev_git\ete3\parser\new
ick.py", line 234, in read_newick
if os.path.exists(newick):
File "C:\Program Files\Python35\lib\genericpath.py", line 19, in exists
os.stat(path)
ValueError: stat: path too long for Windows
----------------------------------------------------------------------
Ran 1 test in 0.019s
FAILED (errors=1)
E:\dwyllie\GitHub\SPIRALTREE\src>
How to fix
Modify the read_newick function as shown
def read_newick(newick, root_node=None, format=0, quoted_names=False):
""" Reads a newick tree from either a string or a file, and returns
an ETE tree structure.
A previously existent node object can be passed as the root of the
tree, which means that all its new children will belong to the same
class as the root(This allows to work with custom TreeNode
objects).
You can also take advantage from this behaviour to concatenate
several tree structures.
"""
if root_node is None:
from ..coretype.tree import TreeNode
root_node = TreeNode()
if isinstance(newick, six.string_types):
# we duck-type the string newick.
# if it is a path, as identified by stat, we assume it is a file
# otherwise, we assume it is a string.
# On windows, the maximal permitted path length is 220 and therefore
# passing large strings raises a ValueError from stat,
# which states 'String is too long for Windows'
# as a workaround, one can test whether newick is too long to be a
# windows string.
if len(newick)<220: # then it could be either a path or a tree
if os.path.exists(newick):
if newick.endswith('.gz'):
import gzip
nw = gzip.open(newick).read()
else:
nw = open(newick, 'rU').read()
else:
nw = newick
else:
nw = newick
matcher = compile_matchers(formatcode=format)
nw = nw.strip()
if not nw.startswith('(') and nw.endswith(';'):
return _read_node_data(nw[:-1], root_node, "single", matcher, format)
elif not nw.startswith('(') or not nw.endswith(';'):
raise NewickError('Unexisting tree file or Malformed newick tree structure.')
else:
return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
else:
raise NewickError("'newick' argument must be either a filename or a newick string.")
Post fix:
E:\dwyllie\GitHub\SPIRALTREE\src>python issue258.py
60430
Reproducing issue #258
Platform: Windows-7-6.1.7601-SP1
python version: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 6
4 bit (AMD64)]
ete3 version: 3.1.1
Loading small tree
There are 3 nodes
Loading large tree
There are 1026 nodes
.
----------------------------------------------------------------------
Ran 1 test in 0.068s
OK
E:\dwyllie\GitHub\SPIRALTREE\src>
I don't think I have permission to create a PR (sorry, I've not done this before) but I have committed my local change to a copy of your repository [here] (https://github.com/davidhwyllie/ETE3_ISSUE258)
If you would like me to do something different, please let me know.
Thank you for your assistance. I will submit a pull request as advised. Issue reproduced and resolution tested on the following system: Windows-10-10.0.17134-SP0 3.6.6rc1 (v3.6.6rc1:1015e38be4, Jun 12 2018, 08:38:06) [MSC v.1900 64 bit (AMD64)]
Unit test and commit fixing issue https://github.com/davidhwyllie/ete/commit/1752154f361f57144ce4b09f3993f6b8374b2993
Hi
A minor issue; in the Newick parser read_newick function (parsers/newick.py, line 220) called when reading newick files, as described here: http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-newick-trees
does a kind of 'duck typing' approach to deciding whether the string passed is a file path or a newick string. On windows this fails if the string is >220bytes because os.stat returns an error from the operating system.
I guess there's various ways of dealing with this, & I've attached a modified module below which implements one such.
Best wishes David