goodmami / penman

PENMAN notation (e.g. AMR) in Python
https://penman.readthedocs.io/
MIT License
139 stars 27 forks source link

How to extract predicate nodes from amr graph #115

Closed tingchihc closed 1 year ago

tingchihc commented 1 year ago

Hi, I have a question about penman. Do we have any function or API to extract the predicate nodes from the amr graph? In this example, the predicate nodes are (z0, wonder-01), (z2, say-01), (z6, see-01). How can I extract these nodes from the amr graph?

thanks,

goodmami commented 1 year ago

These are "instance" triples, mentioned in the API docs, Using Penman as a Python Library guide, and the API demo notebook. If you want just the concept labels and node identifiers, you can get those from the triples:

>>> import penman
>>> g = penman.decode("(f / fly-01 :ARG0 (b / bird))")
>>> g.instances()
[Instance(source='f', role=':instance', target='fly-01'), Instance(source='b', role=':instance', target='bird')]
>>> for instance in g.instances():
...     print(instance.source, instance.target)
... 
f fly-01
b bird
tingchihc commented 1 year ago

Thanks for your reply. I know g.instances() this function can show all the instances in the amr graph. However, I only want the predicate nodes, ex: (f/fly-01). If we do not have the function to extract the predicate nodes, what's the rule to be the predicate nodes? I can follow the rule to find out the predicate nodes.

goodmami commented 1 year ago

Ok, I think I get it. However, predicate is an ambiguous term. If you mean a syntactic predicate (e.g., the main verb, or perhaps all verbs), then note that AMR does not formally distinguish concepts by syntactic categories like 'verb' or 'noun', so you won't have much luck (you can look for verby concepts from framenet that have a -XY suffix, like fly-01, but this fails for concepts that aren't in framenet). If you mean logical predicates, i.e., relations that take arguments, then you can look for those instances whose variable is the source of edges:

>>> import penman
>>> g = penman.decode('(f / fly-01 :ARG0 (b / bird))')
>>> sources = set(edge.source for edge in g.edges())
>>> [inst for inst in g.instances() if inst.source in sources]
[Instance(source='f', role=':instance', target='fly-01')]

However you'll find that things you think are not predicates, like perhaps bird, can take arguments (at least superficially):

>>> g = penman.decode('(f / fly-01 :ARG0 (b / bird :location (o / outside)))')
>>> sources = set(edge.source for edge in g.edges())
>>> [inst for inst in g.instances() if inst.source in sources]
[Instance(source='f', role=':instance', target='fly-01'), Instance(source='b', role=':instance', target='bird')]

You can do a few things to reduce this. One is to only look for core arguments like :ARG0, :ARG1, etc. You might also try reifying edges like :location so the source becomes a target. Since this relies on AMR-specific interpretations, you'll need to use the AMR model when you decode and transform:

>>> from penman.models import amr
>>> from penman.transform import reify_edges
>>> g = penman.decode('(f / fly-01 :ARG0 (b / bird :location (o / outside)))', model=amr.model)
>>> g2 = reify_edges(g, amr.model)
>>> print(penman.encode(g2))
(f / fly-01
   :ARG0 (b / bird
            :ARG1-of (_ / be-located-at-91
                        :ARG2 (o / outside))))
>>> sources2 = set(edge.source for edge in g2.edges())
>>> [inst for inst in g2.instances() if inst.source in sources2]
[Instance(source='f', role=':instance', target='fly-01'), Instance(source='_', role=':instance', target='be-located-at-91')]

Does this help?

tingchihc commented 1 year ago

thanks for your help. I got it.