Linking node back to input sentence.

JosephGatto commented 3 years ago

Is there a way to identify which word in an input sentence that a node/variable is referring to?

goodmami commented 3 years ago

If the source data has in-situ surface alignments, then yes, although I admit it's not the most straightforward. Here is an example graph with the tokenized sentence in the metadata:

>>> import penman
>>> g = penman.decode('''
... # ::snt The cat slept .
... (s / sleep-01~3
...    :ARG0 (c / cat~2))
... ''')

The in-situ alignments are things like ~3 at the end of the concept. The penman.surface module has some functions to help with this, and the sentence is available in the metadata attribute of the graph:

>>> from penman import surface
>>> surface.alignments(g)
{('s', ':instance', 'sleep-01'): Alignment((3,)), ('c', ':instance', 'cat'): Alignment((2,))}
>>> g.metadata['snt']
'The cat slept .'

I don't (yet?) have a function to get the tokens automatically, but you can use the API to do it manually:

>>> tokens = g.metadata['snt'].split()
>>> alignments = surface.alignments(g)
>>> for triple in g.instances():
...   if triple in alignments:
...     indices = alignments[triple].indices
...   else:
...     indices = []
...   print(triple.source, '--', [tokens[i-1] for i in indices])
... 
s -- ['slept']
c -- ['cat']

Some notes:

Some surface alignments are 0-based and some are 1-based, depending on the dataset or system used to produce them. Above I used 1-based alignments, so I did tokens[i-1] to get the correct token.
Alignments may appear not only on concepts, but also on roles (:ARG0~1), constants (:polarity -~3), and re-entrant variables (:ARG0 b~5), so depending on your needs you may need to adjust the above code.
Metadata alignments (# ::alignments 3-1 1-1.1) are not interpreted like in-situ alignments, as there are multiple styles of such annotation (more info).

JosephGatto commented 3 years ago

Wow, thank you for the amazing response this is exactly what I needed. Appreciate your time!!

goodmami commented 3 years ago

Glad it helped!

(Ideally this kind of information would make it into the documentation, but for now these issue comments will have to do.)

goodmami / penman

Linking node back to input sentence. #98