biocore / improved-octo-waddle

Balanced parentheses succinct data structure in Python
BSD 3-Clause "New" or "Revised" License
6 stars 7 forks source link

Malformed skbio.TreeNode object has broken copy function #54

Open mortonjt opened 1 year ago

mortonjt commented 1 year ago

This is a weird issue.

from bp import to_skbio_treenode, parse_newick
tree = parse_newick(open('16S_taxa.biom.nwk').read())
tree.copy()

yields the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 sktree.copy()

File ~/miniconda3/envs/dynomics/lib/python3.9/site-packages/skbio/tree/_tree.py:514, in TreeNode.copy(self)
    511             result.__dict__[key] = deepcopy(node_to_copy.__dict__[key])
    512     return result
--> 514 root = __copy_node(self)
    515 nodes_stack = [[root, self, len(self.children)]]
    517 while nodes_stack:
    518     # check the top node, any children left unvisited?

File ~/miniconda3/envs/dynomics/lib/python3.9/site-packages/skbio/tree/_tree.py:511, in TreeNode.copy.<locals>.__copy_node(node_to_copy)
    509 for key in node_to_copy.__dict__:
    510     if key not in efc:
--> 511         result.__dict__[key] = deepcopy(node_to_copy.__dict__[key])
    512 return result

File ~/miniconda3/envs/dynomics/lib/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/dynomics/lib/python3.9/copy.py:237, in _deepcopy_method(x, memo)
    236 def _deepcopy_method(x, memo): # Copy instance methods
--> 237     return type(x)(x.__func__, deepcopy(x.__self__, memo))

File ~/miniconda3/envs/dynomics/lib/python3.9/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

TypeError: copy() takes 1 positional argument but 2 were given

But running tree = skbio.TreeNode.read('16S_taxa.biom.nwk').copy() doesn't return such an error, hinting at the skbio.TreeNode conversion in this package.

Now, this does not occur in iow==1.0.3 suggesting that this is a recent error.

wasade commented 1 year ago

Ugh, good spot. It's most likely due to either https://github.com/biocore/improved-octo-waddle/commit/ecdf009aea7e03ac1ea38860b39da1b3c1688e2a or https://github.com/biocore/improved-octo-waddle/commit/67178735853deff4269050cbaf1b5359e7b73fa6. We had a difficult time making copy work correctly in tax2tree for carrying over edge numbers as needed for placement data (see https://github.com/biocore/tax2tree/blob/master/t2t/util.py#L18).

In the bigger picture, I think that the TreeNode.copy method is not ideal, and is both confusing and surprising in its operation. I remember digging into this error when exploring this before, and I think what's going on is the wrong branch is triggered in deepcopy although the code referenced here I think has gone through some changes in recent versions of Python.

I think fixing this should be done upstream in skbio. As a work around, one possibility for a true deepcopy sans any attributes would be to serialize / deserialize:

def copy(t):
    return skbio.TreeNode.parse([str(t)])
wasade commented 1 year ago

...and skbio ideally would have the ability to track edge numbers