implement S- and I-node creation from L-, I-, and TA-nodes

Output from calling python src/utils/create_relation_nodes.py -h:

usage: create_relation_nodes.py [-h] --input_dir INPUT_DIR --output_dir
                                OUTPUT_DIR [--s_node_type S_NODE_TYPE]
                                [--s_node_text S_NODE_TEXT]
                                [--ya_node_text YA_NODE_TEXT]
                                [--similarity_measure SIMILARITY_MEASURE]
                                [--nodeset_id NODESET_ID]
                                [--dont_remove_existing_s_and_ya_nodes]
                                [--dont_show_progress] [--silent]

Create S and YA relations from L- and I-nodes and TA relation nodes.

The algorithm works as follows:
0. Remove existing S and YA nodes and their edges if they exist.
1. Align I and L nodes based on the similarity of their texts.
2. Create S nodes and align them with TA nodes by mirroring TA relations between L nodes to
    the aligned I nodes (see step 1).
3. Create YA nodes and relations from I-L and S-TA alignments.

Important Disclaimer:
- This creates relation nodes with a generic text (and type) per S-/YA-node.
- The direction of the new S nodes may not be correct and need to be adjusted later on.

optional arguments:
  -h, --help            show this help message and exit
  --input_dir INPUT_DIR
                        The input directory containing the nodesets.
  --output_dir OUTPUT_DIR
                        The output directory for the modified nodesets.
  --s_node_type S_NODE_TYPE
                        The type of the new S nodes. Default is 'S'.
  --s_node_text S_NODE_TEXT
                        The text of the new S nodes. Default is 'DUMMY'.
  --ya_node_text YA_NODE_TEXT
                        The text of the new YA nodes. Default is 'DUMMY'.
  --similarity_measure SIMILARITY_MEASURE
                        The similarity measure to use for creating YA nodes. Default is 'lcsstr' (Longest common substring).
  --nodeset_id NODESET_ID
                        The ID of the nodeset to process. If not provided, all nodesets in the input directory will be processed.
  --dont_remove_existing_s_and_ya_nodes
                        Whether to remove existing S and YA nodes and their edges before adding new S and YA nodes.
  --dont_show_progress  Whether to show a progress bar when processing multiple nodesets.
  --silent              Whether to show warnings for nodesets with remaining S or YA nodes.

Process finished with exit code 0

Result for Nodeset `25524`

analysis

Notes

this fails for one nodeset (23569). I guess this is because there is a wrong edge from an YA to an TA node, but I did not yet try if removing allows to run it.
we warn if there are connections from / to I nodes or if there are S or YA nodes, but for now we don't remove them
this adds the following types as typed dicts to src.tuils.nodeset_utils:
- Node, Edge, Locution, and Nodeset
this adds the following helper methods to src.tuils.nodeset_utils:
- get_node_ids, create_edges_from_relations, create_relation_nodes_from_alignment, get_binary_relations, remove_relation_nodes_and_edges, and remove_isolated_nodes
this also modifies process_all_nodesets: func gets the already loaded Nodeset object as parameter nodeset instead of the nodeset_dir parameter
this also modifies src.utils.align_i2l_nodes.align_i_and_l_nodes(): we do not sort the I- and L-nodes before aligning them because this required the timestamp which is not part of the new Node type. But the timestamp is broken anyway. @tanikina is this fine?

TODO:

[x] create YA nodes
[x] documentation
[x] full script
[x] cleanup code?
[x] add remove_s_and_ya_nodes_with_edges?
[x] test it
[x] call on all nodesets

ArneBinder / dialam-2024-shared-task

implement S- and I-node creation from L-, I-, and TA-nodes #7

Result for Nodeset `25524`

Notes

ArneBinder / dialam-2024-shared-task

implement S- and I-node creation from L-, I-, and TA-nodes #7

Result for Nodeset 25524

Notes

Result for Nodeset `25524`