Closed JosephGatto closed 3 years ago
If the source data has in-situ surface alignments, then yes, although I admit it's not the most straightforward. Here is an example graph with the tokenized sentence in the metadata:
>>> import penman
>>> g = penman.decode('''
... # ::snt The cat slept .
... (s / sleep-01~3
... :ARG0 (c / cat~2))
... ''')
The in-situ alignments are things like ~3
at the end of the concept. The penman.surface module has some functions to help with this, and the sentence is available in the metadata
attribute of the graph:
>>> from penman import surface
>>> surface.alignments(g)
{('s', ':instance', 'sleep-01'): Alignment((3,)), ('c', ':instance', 'cat'): Alignment((2,))}
>>> g.metadata['snt']
'The cat slept .'
I don't (yet?) have a function to get the tokens automatically, but you can use the API to do it manually:
>>> tokens = g.metadata['snt'].split()
>>> alignments = surface.alignments(g)
>>> for triple in g.instances():
... if triple in alignments:
... indices = alignments[triple].indices
... else:
... indices = []
... print(triple.source, '--', [tokens[i-1] for i in indices])
...
s -- ['slept']
c -- ['cat']
Some notes:
tokens[i-1]
to get the correct token.:ARG0~1
), constants (:polarity -~3
), and re-entrant variables (:ARG0 b~5
), so depending on your needs you may need to adjust the above code.# ::alignments 3-1 1-1.1
) are not interpreted like in-situ alignments, as there are multiple styles of such annotation (more info).Wow, thank you for the amazing response this is exactly what I needed. Appreciate your time!!
Glad it helped!
(Ideally this kind of information would make it into the documentation, but for now these issue comments will have to do.)
Is there a way to identify which word in an input sentence that a node/variable is referring to?