bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.
MIT License
219 stars 34 forks source link

Metadata information for AMR strings #34

Closed raspberryice closed 2 years ago

raspberryice commented 2 years ago

Hi, Would it be possible to output other metadata fields when using the stog parsing model? Such as the node offsets?

bjascob commented 2 years ago

What model are you using? I'm not sure I know what you mean by node offsets but at least for model_parse_t5 there isn't anything like that calculated when creating the graph. If you're talking about the Gorn Addresses used for the ::alignments data. that would be something that you'd need to calculate in a post-processing operation.

raspberryice commented 2 years ago

I'm using model_parse_t5. From other AMR parsers the output contains lines like

# ::node    1   they    0-1
# ::node    2   also    1-2
# ::node    3   inform-01   2-3
# ::node    4   person  3-4

Is there any way to get this information via post-processing?

bjascob commented 2 years ago

The easiest way to get all the node names would be to use penan to decode the AMR string to a Graph object and then you can do g.instances() to get a list of the instance triples. The node name will be triple.target or triple[2].

I don't know what the node numbers are above so I assume you can just randomly assign an index to them.

I don't know what the last numbers are (ie.. 0-1,. etc..) so you'll need to figure out what those represent. If they're alignments, you'll need to run an aligner on the AMR string. Older AMR parsers (ie.. JAMR) required alignments before parsing but most newer ones (ie T5) don't. If you need the alignments. tt's easy enough to add them. See FAA aligner.