Closed fedarko closed 2 years ago
Parsing behavior of malformed newick strings is undefined... that said, it would be nice if it errored out rather than hanging
In [1]: import skbio
In [2]: skbio.TreeNode.read(["(a,b);"])
Out[2]: <TreeNode, name: unnamed, internal node count: 0, tips count: 2>
In [3]: skbio.TreeNode.read(["(a,b)"])
---------------------------------------------------------------------------
NewickFormatError Traceback (most recent call last)
<ipython-input-3-15e871ec9967> in <module>
----> 1 skbio.TreeNode.read(["(a,b)"])
~/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/skbio/io/registry.py in read(cls, file, format, **kwargs)
650 @classonlymethod
651 def read(cls, file, format=None, **kwargs):
--> 652 return registry.read(file, into=cls, format=format, **kwargs)
653
654 imports = registry._import_paths(read_formats)
~/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/skbio/io/registry.py in read(self, file, format, into, verify, **kwargs)
511 return (x for x in [])
512 else:
--> 513 return self._read_ret(file, format, into, verify, kwargs)
514
515 def _read_ret(self, file, fmt, into, verify, kwargs):
~/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/skbio/io/registry.py in _read_ret(self, file, fmt, into, verify, kwargs)
518 reader, kwargs = self._init_reader(file, fmt, into, verify, kwargs,
519 io_kwargs)
--> 520 return reader(file, **kwargs)
521
522 def _read_gen(self, file, fmt, into, verify, kwargs):
~/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/skbio/io/registry.py in wrapped_reader(file, encoding, newline, **kwargs)
996 # is cheaper than insert
997 kwargs.update(zip(file_keys, fhs[:-1]))
--> 998 return reader_function(fhs[-1], **kwargs)
999 else:
1000
~/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/skbio/io/format/newick.py in _newick_to_tree_node(fh, convert_underscores)
324 last_token = token
325
--> 326 raise NewickFormatError("Could not parse file as newick."
327 " `(Parenthesis)`, `'single-quotes'`,"
328 " `[comments]` may be unbalanced, or tree may be"
NewickFormatError: Could not parse file as newick. `(Parenthesis)`, `'single-quotes'`, `[comments]` may be unbalanced, or tree may be missing its root.
Recreating the problem
I had iow 0.1.2 installed, and this is using python 3.6.7.
Not sure why exactly this is happening, but it was pretty confusing to experience :) This problem does impact Empress (QIIME 2 allows you to import the semicolon-less tree above as a
Phylogeny[Rooted]
artifact, and running Empress on this artifact also hangs indefinitely), so it'd be nice to address it here (either by allowing these sorts of Newick strings to be parsed, or by just failing with an error message explaining the problem).References
Example tree used here was based on fourth example tree shown in https://github.com/igraph/igraph/issues/879#issuecomment-144861328.