ArneBinder / dialam-2024-shared-task

see http://dialam.arg.tech/
0 stars 0 forks source link

add `merged_relations` dataset variant #18

Closed ArneBinder closed 2 months ago

ArneBinder commented 2 months ago

This adds another dataset variant called merged_relations to the DialAM-2024 PIE dataset loader. For this one, we process the documents from the default variant a bit further:

  1. the l_nodes layer gets renamed to labeled_spans to follow the default naming scheme of PIE
  2. all n-ary relation annotations from the layers ya_i2l_nodes, ya_s2ta_nodes, and s_nodes
    1. get their label and all roles prefixed by the original layer name and
    2. then get merged into a single layer nary_relations.
  3. The original span layer name (l_nodes) is saved behind the key labeled_span_layer in the document metadata, as well as the list of original relation layer names (["ya_i2l_nodes", "ya_s2ta_nodes", "s_nodes"]) at key nary_relation_layers, so that we can reconstruct the original node ids from the other metadata entries (i.e. ya_i2l_relations, s_relations, ya_s2ta_relations, see convert_to_document() method) later on.

In other words, this allows to train a single model on all the relation data.

Notes: