Graphical representation of the nodeset pre-processing pipeline:
The same drawing with clickable links (to see the corresponding methods) can be found here
After these pre-processing steps we prepare a SimplifiedDialAM2024Document for each nodeset in convert_to_document():
1) Create document text and L-node-spans: link
2) Encode YA relations between I and L nodes (ya_i2l_nodes NaryRelation): link
3) Encode S relations between I nodes (s_nodes NaryRelation): link
4) Encode YA relations between S and TA nodes (ya_s2ta_nodes NaryRelation): link
5) Add all original data as metadata: link
Nodeset Pre-processing Pipeline
To prepare nodesets we follow the steps as specified in
src/utils/prepare_nodeset.py
: 1) Clean up the nodeset by removing isolated nodes and invalid transitions: get_valid_src_trg_and_node_ids_from_relations() Only the following transitions are allowed (see get_relations() for more detail):2) Remove S- and YA-nodes with edges: remove_s_and_ya_nodes_with_edges() 3) Add dummy S- and YA-nodes with edges by matching L- and I-nodes based on the similarity measure: add_s_and_ya_nodes_with_edges() a. Align I and L nodes based on the similarity of their texts: align_i_and_l_nodes() b. Create S nodes and align them with TA nodes by mirroring TA relations between L nodes to the aligned I nodes (see 3a): create_s_relations_and_nodes_from_ta_nodes_and_il_alignment() c. Create YA nodes and relations from I-L and S-TA alignments: add_s_and_ya_nodes_with_edges() 4) Optionally, add cleaned gold data for training (from the output of step 1): a. Normalize the direction of the RA-relation nodes: normalize_ra_relation_direction() b. Update the text and type of the result relation nodes with matching gold data: get_node_matching() c. Add remaining nodes and edges from the gold data that were not matched: merge_other_into_nodeset()
Graphical representation of the nodeset pre-processing pipeline:
The same drawing with clickable links (to see the corresponding methods) can be found here
After these pre-processing steps we prepare a
SimplifiedDialAM2024Document
for each nodeset in convert_to_document(): 1) Create document text and L-node-spans: link 2) Encode YA relations between I and L nodes (ya_i2l_nodes
NaryRelation): link 3) Encode S relations between I nodes (s_nodes
NaryRelation): link 4) Encode YA relations between S and TA nodes (ya_s2ta_nodes
NaryRelation): link 5) Add all original data as metadata: link