ArneBinder / dialam-2024-shared-task

see http://dialam.arg.tech/
0 stars 0 forks source link

implement S- and I-node creation from L-, I-, and TA-nodes #7

Closed ArneBinder closed 6 months ago

ArneBinder commented 6 months ago

Output from calling python src/utils/create_relation_nodes.py -h:

usage: create_relation_nodes.py [-h] --input_dir INPUT_DIR --output_dir
                                OUTPUT_DIR [--s_node_type S_NODE_TYPE]
                                [--s_node_text S_NODE_TEXT]
                                [--ya_node_text YA_NODE_TEXT]
                                [--similarity_measure SIMILARITY_MEASURE]
                                [--nodeset_id NODESET_ID]
                                [--dont_remove_existing_s_and_ya_nodes]
                                [--dont_show_progress] [--silent]

Create S and YA relations from L- and I-nodes and TA relation nodes.

The algorithm works as follows:
0. Remove existing S and YA nodes and their edges if they exist.
1. Align I and L nodes based on the similarity of their texts.
2. Create S nodes and align them with TA nodes by mirroring TA relations between L nodes to
    the aligned I nodes (see step 1).
3. Create YA nodes and relations from I-L and S-TA alignments.

Important Disclaimer:
- This creates relation nodes with a generic text (and type) per S-/YA-node.
- The direction of the new S nodes may not be correct and need to be adjusted later on.

optional arguments:
  -h, --help            show this help message and exit
  --input_dir INPUT_DIR
                        The input directory containing the nodesets.
  --output_dir OUTPUT_DIR
                        The output directory for the modified nodesets.
  --s_node_type S_NODE_TYPE
                        The type of the new S nodes. Default is 'S'.
  --s_node_text S_NODE_TEXT
                        The text of the new S nodes. Default is 'DUMMY'.
  --ya_node_text YA_NODE_TEXT
                        The text of the new YA nodes. Default is 'DUMMY'.
  --similarity_measure SIMILARITY_MEASURE
                        The similarity measure to use for creating YA nodes. Default is 'lcsstr' (Longest common substring).
  --nodeset_id NODESET_ID
                        The ID of the nodeset to process. If not provided, all nodesets in the input directory will be processed.
  --dont_remove_existing_s_and_ya_nodes
                        Whether to remove existing S and YA nodes and their edges before adding new S and YA nodes.
  --dont_show_progress  Whether to show a progress bar when processing multiple nodesets.
  --silent              Whether to show warnings for nodesets with remaining S or YA nodes.

Process finished with exit code 0

Result for Nodeset 25524

analysis

Notes

TODO: