This adds another dataset variant called merged_relations to the DialAM-2024 PIE dataset loader. For this one, we process the documents from the default variant a bit further:
the l_nodes layer gets renamed to labeled_spans to follow the default naming scheme of PIE
all n-ary relation annotations from the layers ya_i2l_nodes, ya_s2ta_nodes, and s_nodes
get their label and all roles prefixed by the original layer name and
then get merged into a single layer nary_relations.
The original span layer name (l_nodes) is saved behind the key labeled_span_layer in the document metadata, as well as the list of original relation layer names (["ya_i2l_nodes", "ya_s2ta_nodes", "s_nodes"]) at key nary_relation_layers, so that we can reconstruct the original node ids from the other metadata entries (i.e. ya_i2l_relations, s_relations, ya_s2ta_relations, see convert_to_document() method) later on.
In other words, this allows to train a single model on all the relation data.
Notes:
this also adds the respective experiment config dialam2024_merged_relations
this also renames the config name in the HF dataset loading script from dialam2024 to default to be more consistent
This adds another dataset variant called
merged_relations
to the DialAM-2024 PIE dataset loader. For this one, we process the documents from thedefault
variant a bit further:l_nodes
layer gets renamed tolabeled_spans
to follow the default naming scheme of PIEya_i2l_nodes
,ya_s2ta_nodes
, ands_nodes
nary_relations
.l_nodes
) is saved behind the keylabeled_span_layer
in the document metadata, as well as the list of original relation layer names (["ya_i2l_nodes", "ya_s2ta_nodes", "s_nodes"]
) at keynary_relation_layers
, so that we can reconstruct the original node ids from the other metadata entries (i.e.ya_i2l_relations
,s_relations
,ya_s2ta_relations
, seeconvert_to_document()
method) later on.In other words, this allows to train a single model on all the relation data.
Notes:
dialam2024_merged_relations
dialam2024
todefault
to be more consistent