Get node variables in tree.walk()

LeonardoEmili commented 2 years ago

Hi @goodmami, thanks for the great library, I was wondering if there is a particular reason for not returning also the variables used to identify nodes in the graph in the method tree.walk(). I see that this information is available throughout the parser and it could be useful to add it to values returned by the parser.

goodmami commented 2 years ago

Hi, and thanks!

The Tree.walk() method iterates over branches and not nodes. It was created to assist with JAMR-style tree alignments (see #74, #91), and at the time there was no real need for the variables. It's not hard to use the path to recover them, though:

>>> import penman
>>> def path_var(path, node):
...     var, branches = node
...     for step in path[:-1]:
...         var, branches = branches[step][1]
...     return var
... 
>>> t = penman.parse('(c / chase-01 :ARG0 (d / dog) :ARG1 (c2 / cat :mod (b / black)))')
>>> for path, branch in t.walk():
...     print(path, path_var(path, t.node), branch)
... 
(0,) c ('/', 'chase-01')
(1,) c (':ARG0', ('d', [('/', 'dog')]))
(1, 0) d ('/', 'dog')
(2,) c (':ARG1', ('c2', [('/', 'cat'), (':mod', ('b', [('/', 'black')]))]))
(2, 0) c2 ('/', 'cat')
(2, 1) c2 (':mod', ('b', [('/', 'black')]))
(2, 1, 0) b ('/', 'black')

I would rather not modify Tree.walk() to return the variable because that would break backward compatibility. What about a separate function that iterates over nodes instead of branches (e.g., tree.node_walk())?

Or if all you need is the list of variables used in the tree, tree.nodes() can help:

>>> [var for var, _ in t.nodes()]
['c', 'd', 'c2', 'b']

LeonardoEmili commented 2 years ago

Thanks for the prompt reply! It makes sense, indeed such a function would be a useful addition. For the moment, I obtained the same output as using the above snippet by combining the iteration on nodes you provided in #74 (i.e. checking if a branch is atomic and rescaling its path to be 0-indexed) and extracting the node's var from here. Do you consider adding it as a function to the library?

goodmami commented 2 years ago

Do you consider adding it as a function to the library?

I'd be happy to review a PR for a new Tree method. I'd be more inclined to accept it if either (a) the functionality is sufficiently different than what exists, and/or (b) there is a clear use case providing justification. If it is exactly the same as Tree.walk() but it only differs by including the variable at each step, then I might prefer augmenting Tree.walk() with a parameter like variables=False|True such that the variable is included when the parameter is True:

>>> t = penman.parse('(c / chase-01 :ARG0 (d / dog) :ARG1 (c2 / cat :mod (b / black)))')
>>> for path, var, branch in t.walk(variables=True):
...     ...

In general, though, I don't like changing the shape of return values. The alternative of yielding a node at each step would be sufficiently different from Tree.walk(). E.g.:

>>> for path, node in t.node_walk():
...     print(path, node)
...
() ('c', [('/', 'chase-01'), (':ARG0', ('d', [('/', 'dog')])), (':ARG1', ('c2', [('/', 'cat'), (':mod', ('b', [('/', 'black')]))]))])
(1,) ('d', [('/', 'dog')])
(2,) ('c2', [('/', 'cat'), (':mod', ('b', [('/', 'black')]))])
(2, 1) ('b', [('/', 'black')])

But this is quite similar to Tree.nodes():

>>> for node in t.nodes():
...     print(node)
... 
('c', [('/', 'chase-01'), (':ARG0', ('d', [('/', 'dog')])), (':ARG1', ('c2', [('/', 'cat'), (':mod', ('b', [('/', 'black')]))]))])
('d', [('/', 'dog')])
('c2', [('/', 'cat'), (':mod', ('b', [('/', 'black')]))])
('b', [('/', 'black')])

Does the node-walk behavior above match your current needs? Or perhaps you can say more about why you need the variable in the branch walk?

LeonardoEmili commented 2 years ago

Does the node-walk behavior above match your current needs? Or perhaps you can say more about why you need the variable in the branch walk?

Not exactly, the use case I am interested in would need to show all terminal nodes with the associated variables. In these days I can't work on a PR to show you what I mean, I'll probably do it next week and update this thread.

goodmami commented 2 years ago

the use case I am interested in would need to show all terminal nodes with the associated variables.

And Tree.nodes() (example in my previous comment) does not work for you? It basically does the walk over nodes but does not specify the path it took to get there. If you mean "terminal nodes" meaning those without outgoing branches, then you can check if the branch list of each node has length 1 (the concept only). If not, then I would need an example of the kind of output you expect to receive.

goodmami commented 2 years ago

@LeonardoEmili I'm going to close this issue since it seems the question was answered. If you want to provide the function you have described, please put it in a PR that links to this issue for context.

goodmami / penman

Get node variables in tree.walk() #104