Output from calling python src/utils/create_relation_nodes.py -h:
usage: create_relation_nodes.py [-h] --input_dir INPUT_DIR --output_dir
OUTPUT_DIR [--s_node_type S_NODE_TYPE]
[--s_node_text S_NODE_TEXT]
[--ya_node_text YA_NODE_TEXT]
[--similarity_measure SIMILARITY_MEASURE]
[--nodeset_id NODESET_ID]
[--dont_remove_existing_s_and_ya_nodes]
[--dont_show_progress] [--silent]
Create S and YA relations from L- and I-nodes and TA relation nodes.
The algorithm works as follows:
0. Remove existing S and YA nodes and their edges if they exist.
1. Align I and L nodes based on the similarity of their texts.
2. Create S nodes and align them with TA nodes by mirroring TA relations between L nodes to
the aligned I nodes (see step 1).
3. Create YA nodes and relations from I-L and S-TA alignments.
Important Disclaimer:
- This creates relation nodes with a generic text (and type) per S-/YA-node.
- The direction of the new S nodes may not be correct and need to be adjusted later on.
optional arguments:
-h, --help show this help message and exit
--input_dir INPUT_DIR
The input directory containing the nodesets.
--output_dir OUTPUT_DIR
The output directory for the modified nodesets.
--s_node_type S_NODE_TYPE
The type of the new S nodes. Default is 'S'.
--s_node_text S_NODE_TEXT
The text of the new S nodes. Default is 'DUMMY'.
--ya_node_text YA_NODE_TEXT
The text of the new YA nodes. Default is 'DUMMY'.
--similarity_measure SIMILARITY_MEASURE
The similarity measure to use for creating YA nodes. Default is 'lcsstr' (Longest common substring).
--nodeset_id NODESET_ID
The ID of the nodeset to process. If not provided, all nodesets in the input directory will be processed.
--dont_remove_existing_s_and_ya_nodes
Whether to remove existing S and YA nodes and their edges before adding new S and YA nodes.
--dont_show_progress Whether to show a progress bar when processing multiple nodesets.
--silent Whether to show warnings for nodesets with remaining S or YA nodes.
Process finished with exit code 0
Result for Nodeset 25524
Notes
this fails for one nodeset (23569). I guess this is because there is a wrong edge from an YA to an TA node, but I did not yet try if removing allows to run it.
we warn if there are connections from / to I nodes or if there are S or YA nodes, but for now we don't remove them
this adds the following types as typed dicts to src.tuils.nodeset_utils:
Node, Edge, Locution, and Nodeset
this adds the following helper methods to src.tuils.nodeset_utils:
get_node_ids, create_edges_from_relations, create_relation_nodes_from_alignment, get_binary_relations, remove_relation_nodes_and_edges, and remove_isolated_nodes
this also modifies process_all_nodesets: func gets the already loaded Nodeset object as parameter nodeset instead of the nodeset_dir parameter
this also modifies src.utils.align_i2l_nodes.align_i_and_l_nodes(): we do not sort the I- and L-nodes before aligning them because this required the timestamp which is not part of the new Node type. But the timestamp is broken anyway. @tanikina is this fine?
Output from calling
python src/utils/create_relation_nodes.py -h
:Result for Nodeset
25524
Notes
23569
). I guess this is because there is a wrong edge from an YA to an TA node, but I did not yet try if removing allows to run it.src.tuils.nodeset_utils
:Node
,Edge
,Locution
, andNodeset
src.tuils.nodeset_utils
:get_node_ids
,create_edges_from_relations
,create_relation_nodes_from_alignment
,get_binary_relations
,remove_relation_nodes_and_edges
, andremove_isolated_nodes
process_all_nodesets
:func
gets the already loadedNodeset
object as parameternodeset
instead of thenodeset_dir
parametersrc.utils.align_i2l_nodes.align_i_and_l_nodes()
: we do not sort the I- and L-nodes before aligning them because this required the timestamp which is not part of the newNode
type. But the timestamp is broken anyway. @tanikina is this fine?TODO:
remove_s_and_ya_nodes_with_edges
?