ArneBinder / dialam-2024-shared-task

see http://dialam.arg.tech/
0 stars 0 forks source link

data cleanup script #9

Closed tanikina closed 5 months ago

tanikina commented 6 months ago

Clean-up the data as follows:

  1. Remove isolated nodes disconnected from the graph.
  2. Remove invalid relation edges. We only allow the following transitions:
    • I > {MA, RA, CA} > I
    • L > TA > L
    • {L, TA} > YA > {I, L, MA, RA, CA} and also, we allow just one source and target node for each relation, except for S-relations where we allow multiple sources.
  3. Swap the edges for S-nodes that point downwards. "Downward" means that I-source node for RA is anchored through YA in TA node which has a source L-node that appears earlier in the graph than the corresponding L-node that is connected (through YA and TA-nodes) to I-target node.

Usage: $ python3 src/utils/cleanup_data.py --input_dir --output_dir [--nodeset_id] [--dont_show_progress] [--normalize_relation_direction]

Based on the following information from the annotation details file

image

In total, we have 72 MA-rev relations and 2 CA-rev relations in the data:

WARNING:__main__:nodeset=23695: Relation node 860484 of type MA-rev was reversed. WARNING:__main__:nodeset=23627: Relation node 857244 of type MA-rev was reversed. WARNING:__main__:nodeset=23837: Relation node 867637 of type MA-rev was reversed. WARNING:__main__:nodeset=21083: Relation node 695726 of type MA-rev was reversed. WARNING:__main__:nodeset=17938: Relation node 512738 of type MA-rev was reversed. WARNING:__main__:nodeset=17938: Relation node 512742 of type MA-rev was reversed. WARNING:__main__:nodeset=23892: Relation node 870541 of type MA-rev was reversed. WARNING:__main__:nodeset=23799: Relation node 865590 of type MA-rev was reversed. WARNING:__main__:nodeset=25904: Relation node 1040250 of type MA-rev was reversed. WARNING:__main__:nodeset=21276: Relation node 704403 of type MA-rev was reversed. WARNING:__main__:nodeset=18308: Relation node 542296 of type MA-rev was reversed. WARNING:__main__:nodeset=21711: Relation node 733063 of type MA-rev was reversed. WARNING:__main__:nodeset=19773: Relation node 633853 of type MA-rev was reversed. WARNING:__main__:nodeset=18321: Relation node 543301 of type MA-rev was reversed. WARNING:__main__:nodeset=23391: Relation node 843854 of type MA-rev was reversed. WARNING:__main__:nodeset=23009: Relation node 810914 of type MA-rev was reversed. WARNING:__main__:nodeset=18877: Relation node 571942 of type MA-rev was reversed. WARNING:__main__:nodeset=23821: Relation node 866817 of type MA-rev was reversed. WARNING:__main__:nodeset=21275: Relation node 704445 of type MA-rev was reversed. WARNING:__main__:nodeset=23699: Relation node 860732 of type MA-rev was reversed. WARNING:__main__:nodeset=23699: Relation node 860735 of type MA-rev was reversed. WARNING:__main__:nodeset=20772: Relation node 677921 of type MA-rev was reversed. WARNING:__main__:nodeset=19887: Relation node 640315 of type MA-rev was reversed. WARNING:__main__:nodeset=21655: Relation node 728899 of type MA-rev was reversed. WARNING:__main__:nodeset=23923: Relation node 872107 of type MA-rev was reversed. WARNING:__main__:nodeset=19897: Relation node 641331 of type MA-rev was reversed. WARNING:__main__:nodeset=20745: Relation node 675593 of type MA-rev was reversed. WARNING:__main__:nodeset=21705: Relation node 732584 of type MA-rev was reversed. WARNING:__main__:nodeset=21022: Relation node 691942 of type MA-rev was reversed. WARNING:__main__:nodeset=21613: Relation node 726659 of type MA-rev was reversed. WARNING:__main__:nodeset=20313: Relation node 656926 of type MA-rev was reversed. WARNING:__main__:nodeset=21056: Relation node 694357 of type MA-rev was reversed. WARNING:__main__:nodeset=23918: Relation node 871869 of type MA-rev was reversed. WARNING:__main__:nodeset=21426: Relation node 715933 of type MA-rev was reversed. WARNING:__main__:nodeset=25528: Relation node 1021506 of type MA-rev was reversed. WARNING:__main__:nodeset=17930: Relation node 512103 of type MA-rev was reversed. WARNING:__main__:nodeset=18485: Relation node 553634 of type MA-rev was reversed. WARNING:__main__:nodeset=25907: Relation node 1040550 of type MA-rev was reversed. WARNING:__main__:nodeset=20752: Relation node 676318 of type MA-rev was reversed. WARNING:__main__:nodeset=23552: Relation node 853284 of type CA-rev was reversed. WARNING:__main__:nodeset=21336: Relation node 709179 of type MA-rev was reversed. WARNING:__main__:nodeset=21039: Relation node 693252 of type MA-rev was reversed. WARNING:__main__:nodeset=19217: Relation node 598644 of type MA-rev was reversed. WARNING:__main__:nodeset=23867: Relation node 869631 of type MA-rev was reversed. WARNING:__main__:nodeset=23701: Relation node 860847 of type MA-rev was reversed. WARNING:__main__:nodeset=20894: Relation node 685302 of type MA-rev was reversed. WARNING:__main__:nodeset=25938: Relation node 1045355 of type MA-rev was reversed. WARNING:__main__:nodeset=25691: Relation node 1027485 of type MA-rev was reversed. WARNING:__main__:nodeset=23551: Relation node 853205 of type MA-rev was reversed. WARNING:__main__:nodeset=23551: Relation node 853208 of type MA-rev was reversed. WARNING:__main__:nodeset=19154: Relation node 593437 of type MA-rev was reversed. WARNING:__main__:nodeset=21306: Relation node 707141 of type CA-rev was reversed. WARNING:__main__:nodeset=23723: Relation node 861862 of type MA-rev was reversed. WARNING:__main__:nodeset=20729: Relation node 674598 of type MA-rev was reversed. WARNING:__main__:nodeset=23398: Relation node 844351 of type MA-rev was reversed. WARNING:__main__:nodeset=23398: Relation node 844354 of type MA-rev was reversed. WARNING:__main__:nodeset=23120: Relation node 821030 of type MA-rev was reversed. WARNING:__main__:nodeset=21480: Relation node 720427 of type MA-rev was reversed. WARNING:__main__:nodeset=23560: Relation node 853746 of type MA-rev was reversed. WARNING:__main__:nodeset=21000: Relation node 690466 of type MA-rev was reversed. WARNING:__main__:nodeset=20479: Relation node 663489 of type MA-rev was reversed. WARNING:__main__:nodeset=20879: Relation node 684147 of type MA-rev was reversed. WARNING:__main__:nodeset=23895: Relation node 870706 of type MA-rev was reversed. WARNING:__main__:nodeset=21016: Relation node 691529 of type MA-rev was reversed. WARNING:__main__:nodeset=18795: Relation node 566030 of type MA-rev was reversed. WARNING:__main__:nodeset=18320: Relation node 543107 of type MA-rev was reversed. WARNING:__main__:nodeset=20518: Relation node 666669 of type MA-rev was reversed. WARNING:__main__:nodeset=21404: Relation node 714702 of type MA-rev was reversed. WARNING:__main__:nodeset=21404: Relation node 714705 of type MA-rev was reversed. WARNING:__main__:nodeset=23599: Relation node 855594 of type MA-rev was reversed. WARNING:__main__:nodeset=19165: Relation node 594816 of type MA-rev was reversed. WARNING:__main__:nodeset=19165: Relation node 594825 of type MA-rev was reversed. WARNING:__main__:nodeset=19165: Relation node 594831 of type MA-rev was reversed. WARNING:__main__:nodeset=23537: Relation node 852404 of type MA-rev was reversed.

Unfortunately, sometimes we get invalid relations because of the missing anchor nodes (see the example below, S-node with "Default Conflict" annotation is not anchored via YA-node in any TA-node). We still keep such cases because they participate in valid I > S > I transitions.

WARNING:__main__:nodeset=19316: Relation ID 604495 does not have an anchor.

not-anchored-s-node-nodeset19316

EDIT: Now we can also specify the nodeset blacklist to avoid processing nodesets that have a lot of missing edges. Here is the list of "bad" nodesets: 24905, 25468, 24903, 25445, 25462, 24808, 25475, 25452, 25472, 24809, 25442, 25444, 25465, 24807, 25473, 25443, 25461, 25474, 25441, 25045, 19761, 24992, 25463

Usage example:

python3 src/utils/cleanup_data.py --input_dir=data/train --output_dir=data/train_clean --normalize_relation_direction --dont_show_progress --nodeset_blacklist="24905, 25468, 24903, 25445, 25462, 24808, 25475, 25452, 25472, 24809, 25442, 25444, 25465, 24807, 25473, 25443, 25461, 25474, 25441, 25045, 19761, 24992, 25463"

TODO:

ArneBinder commented 5 months ago

When calling the script with the blacklist, i.e.

python src/utils/prepare_data.py --input_dir=data/dataset --output_dir=data/dataset_prepared_with_gold --add_gold_data --nodeset_blacklist="24255, 24807, 24808, 24809, 24903, 24905, 24992, 25045, 25441, 25442, 25443, 25444, 25445, 25452, 25461, 25462, 25463, 25465, 25468, 25472, 25473, 25474, 25475"

we now get just 52 errors (1404 successfully processed and 23 blacklisted nodesets): 25936, 25902, 25904, 21477, 18459, 25938, 19217, 26068, 25516, 19146, 21588, 25937, 23766, 21681, 20844, 21449, 19908, 18888, 21342, 19149, 23710, 23891, 25940, 26067, 25411, 26087, 19757, 25510, 25400, 19911, 19761, 20510, 25901, 20992, 25906, 26066, 22969, 17964, 19059, 21401, 18484, 25552, 23749, 25907, 19091, 19165, 23114, 20888, 19878, 20479, 20507, 20766

click to see ``` ('25936', ValueError("nodeset=25936: could not determine direction of RA-nodes ['1045105', '1045111', '1045119', '1045131', '1045138', '1045145'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25902', ValueError("nodeset=25902: could not determine direction of RA-nodes ['1040119', '1040136', '1040144', '1040156'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25904', ValueError("nodeset=25904: could not determine direction of RA-nodes ['1040243', '1040282', '1040284', '1040306', '1040325', '1040327', '1040334', '1040386'] because there is no TA relation between any combination of anchoring I-nodes!")), ('21477', ValueError('direction of RA-node 720124 is ambiguous!')), ('18459', ValueError("nodeset=18459: could not determine direction of RA-nodes ['551360'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25938', ValueError("nodeset=25938: could not determine direction of RA-nodes ['1045330', '1045339', '1045352'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19217', ValueError('direction of RA-node 598485 is ambiguous!')), ('26068', ValueError("nodeset=26068: could not determine direction of RA-nodes ['1058687', '1058715', '1058759', '1058762'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25516', ValueError("nodeset=25516: could not determine direction of RA-nodes ['1020784'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19146', ValueError("nodeset=19146: could not determine direction of RA-nodes ['592456'] because there is no TA relation between any combination of anchoring I-nodes!")), ('21588', ValueError('direction of RA-node 724367 is ambiguous!')), ('25937', ValueError("nodeset=25937: could not determine direction of RA-nodes ['1045187', '1045193', '1045199', '1045211', '1045218', '1045225'] because there is no TA relation between any combination of anchoring I-nodes!")), ('23766', ValueError('direction of RA-node 864140 is ambiguous!')), ('21681', ValueError("nodeset=21681: S-node arguments are not unique! missing relations: [{'sources': ['730239'], 'targets': ['730196'], 'relation': '730399'}]")), ('20844', ValueError('direction of RA-node 682026 is ambiguous!')), ('21449', ValueError("nodeset=21449: could not determine direction of RA-nodes ['717746'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19908', ValueError("nodeset=19908: could not determine direction of RA-nodes ['642083'] because there is no TA relation between any combination of anchoring I-nodes!")), ('18888', ValueError("nodeset=18888: S-node arguments are not unique! missing relations: [{'sources': ['572739'], 'targets': ['572712'], 'relation': '572841'}]")), ('21342', ValueError("nodeset=21342: S-node arguments are not unique! missing relations: [{'sources': ['709522'], 'targets': ['709508'], 'relation': '709559'}]")), ('19149', ValueError("nodeset=19149: could not determine direction of RA-nodes ['592728', '592776', '592806'] because there is no TA relation between any combination of anchoring I-nodes!")), ('23710', ValueError("nodeset=23710: S-node arguments are not unique! missing relations: [{'sources': ['811355'], 'targets': ['811397'], 'relation': '861291'}]")), ('23891', ValueError('direction of RA-node 870484 is ambiguous!')), ('25940', ValueError("nodeset=25940: could not determine direction of RA-nodes ['1045512', '1045520'] because there is no TA relation between any combination of anchoring I-nodes!")), ('26067', ValueError("nodeset=26067: could not determine direction of RA-nodes ['1058555', '1058563', '1058605', '1058624', '1058630', '1058642', '1058660', '1058662'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25411', ValueError("nodeset=25411: could not determine direction of RA-nodes ['1003826'] because there is no TA relation between any combination of anchoring I-nodes!")), ('26087', ValueError("nodeset=26087: could not determine direction of RA-nodes ['1060873', '1060896', '1060906', '1060928', '1060930', '1060952'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19757', ValueError("nodeset=19757: could not determine direction of RA-nodes ['632412'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25510', ValueError("nodeset=25510: could not determine direction of RA-nodes ['1020342'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25400', ValueError("nodeset=25400: S-node arguments are not unique! missing relations: [{'sources': ['717266'], 'targets': ['717211'], 'relation': '1003490'}]")), ('19911', ValueError('nodeset=19911: I-node texts are not unique!')), ('19761', ValueError("nodeset=19761: could not determine direction of RA-nodes ['632792'] because there is no TA relation between any combination of anchoring I-nodes!")), ('20510', ValueError('direction of RA-node 666105 is ambiguous!')), ('25901', ValueError("nodeset=25901: could not determine direction of RA-nodes ['1040059', '1040106'] because there is no TA relation between any combination of anchoring I-nodes!")), ('20992', ValueError('direction of RA-node 689965 is ambiguous!')), ('25906', ValueError("nodeset=25906: could not determine direction of RA-nodes ['1040470', '1040528'] because there is no TA relation between any combination of anchoring I-nodes!")), ('26066', ValueError("nodeset=26066: could not determine direction of RA-nodes ['1058494', '1058509'] because there is no TA relation between any combination of anchoring I-nodes!")), ('22969', ValueError("nodeset=22969: S-node arguments are not unique! missing relations: [{'sources': ['641748'], 'targets': ['641718'], 'relation': '806997'}]")), ('17964', ValueError("nodeset=17964: could not determine direction of RA-nodes ['514854'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19059', ValueError('direction of RA-node 587841 is ambiguous!')), ('21401', ValueError('direction of RA-node 714397 is ambiguous!')), ('18484', ValueError("nodeset=18484: could not determine direction of RA-nodes ['553464'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25552', ValueError("nodeset=25552: S-node arguments are not unique! missing relations: [{'sources': ['572305'], 'targets': ['572285'], 'relation': '1022478'}]")), ('23749', ValueError("nodeset=23749: could not determine direction of RA-nodes ['863294'] because there is no TA relation between any combination of anchoring I-nodes!")), ('25907', ValueError("nodeset=25907: could not determine direction of RA-nodes ['1040578'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19091', ValueError("nodeset=19091: could not determine direction of RA-nodes ['590210'] because there is no TA relation between any combination of anchoring I-nodes!")), ('19165', ValueError("nodeset=19165: S-node arguments are not unique! missing relations: [{'sources': ['594812'], 'targets': ['594812'], 'relation': '594919'}]")), ('23114', ValueError('direction of RA-node 820629 is ambiguous!')), ('20888', ValueError('direction of RA-node 684790 is ambiguous!')), ('19878', ValueError('direction of RA-node 639622 is ambiguous!')), ('20479', ValueError('direction of RA-node 663443 is ambiguous!')), ('20507', ValueError('direction of RA-node 665828 is ambiguous!')), ('20766', ValueError('direction of RA-node 677366 is ambiguous!')), ```

sorted to get the counts per error type:

click to see ``` 25411: ValueError("could not determine direction of RA-nodes ['1003826'] because there is no TA relation between any combination of anchoring I-nodes!")), 25510: ValueError("could not determine direction of RA-nodes ['1020342'] because there is no TA relation between any combination of anchoring I-nodes!")), 25516: ValueError("could not determine direction of RA-nodes ['1020784'] because there is no TA relation between any combination of anchoring I-nodes!")), 25901: ValueError("could not determine direction of RA-nodes ['1040059', '1040106'] because there is no TA relation between any combination of anchoring I-nodes!")), 25902: ValueError("could not determine direction of RA-nodes ['1040119', '1040136', '1040144', '1040156'] because there is no TA relation between any combination of anchoring I-nodes!")), 25904: ValueError("could not determine direction of RA-nodes ['1040243', '1040282', '1040284', '1040306', '1040325', '1040327', '1040334', '1040386'] because there is no TA relation between any combination of anchoring I-nodes!")), 25906: ValueError("could not determine direction of RA-nodes ['1040470', '1040528'] because there is no TA relation between any combination of anchoring I-nodes!")), 25907: ValueError("could not determine direction of RA-nodes ['1040578'] because there is no TA relation between any combination of anchoring I-nodes!")), 25936: ValueError("could not determine direction of RA-nodes ['1045105', '1045111', '1045119', '1045131', '1045138', '1045145'] because there is no TA relation between any combination of anchoring I-nodes!")), 25937: ValueError("could not determine direction of RA-nodes ['1045187', '1045193', '1045199', '1045211', '1045218', '1045225'] because there is no TA relation between any combination of anchoring I-nodes!")), 25938: ValueError("could not determine direction of RA-nodes ['1045330', '1045339', '1045352'] because there is no TA relation between any combination of anchoring I-nodes!")), 25940: ValueError("could not determine direction of RA-nodes ['1045512', '1045520'] because there is no TA relation between any combination of anchoring I-nodes!")), 26066: ValueError("could not determine direction of RA-nodes ['1058494', '1058509'] because there is no TA relation between any combination of anchoring I-nodes!")), 26067: ValueError("could not determine direction of RA-nodes ['1058555', '1058563', '1058605', '1058624', '1058630', '1058642', '1058660', '1058662'] because there is no TA relation between any combination of anchoring I-nodes!")), 26068: ValueError("could not determine direction of RA-nodes ['1058687', '1058715', '1058759', '1058762'] because there is no TA relation between any combination of anchoring I-nodes!")), 26087: ValueError("could not determine direction of RA-nodes ['1060873', '1060896', '1060906', '1060928', '1060930', '1060952'] because there is no TA relation between any combination of anchoring I-nodes!")), 17964: ValueError("could not determine direction of RA-nodes ['514854'] because there is no TA relation between any combination of anchoring I-nodes!")), 18459: ValueError("could not determine direction of RA-nodes ['551360'] because there is no TA relation between any combination of anchoring I-nodes!")), 19091: ValueError("could not determine direction of RA-nodes ['590210'] because there is no TA relation between any combination of anchoring I-nodes!")), 19146: ValueError("could not determine direction of RA-nodes ['592456'] because there is no TA relation between any combination of anchoring I-nodes!")), 19149: ValueError("could not determine direction of RA-nodes ['592728', '592776', '592806'] because there is no TA relation between any combination of anchoring I-nodes!")), 19757: ValueError("could not determine direction of RA-nodes ['632412'] because there is no TA relation between any combination of anchoring I-nodes!")), 19761: ValueError("could not determine direction of RA-nodes ['632792'] because there is no TA relation between any combination of anchoring I-nodes!")), 19908: ValueError("could not determine direction of RA-nodes ['642083'] because there is no TA relation between any combination of anchoring I-nodes!")), 21449: ValueError("could not determine direction of RA-nodes ['717746'] because there is no TA relation between any combination of anchoring I-nodes!")), 23749: ValueError("could not determine direction of RA-nodes ['863294'] because there is no TA relation between any combination of anchoring I-nodes!")), 25552: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['572305'], 'targets': ['572285'], 'relation': '1022478'}]")), 19165: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['594812'], 'targets': ['594812'], 'relation': '594919'}]")), 22969: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['641748'], 'targets': ['641718'], 'relation': '806997'}]")), 21342: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['709522'], 'targets': ['709508'], 'relation': '709559'}]")), 25400: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['717266'], 'targets': ['717211'], 'relation': '1003490'}]")), 21681: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['730239'], 'targets': ['730196'], 'relation': '730399'}]")), 23710: ValueError("S-node arguments are not unique! missing relations: [{'sources': ['811355'], 'targets': ['811397'], 'relation': '861291'}]")), 19059: ValueError('direction of RA-node 587841 is ambiguous!')), 19217: ValueError('direction of RA-node 598485 is ambiguous!')), 19878: ValueError('direction of RA-node 639622 is ambiguous!')), 20479: ValueError('direction of RA-node 663443 is ambiguous!')), 20507: ValueError('direction of RA-node 665828 is ambiguous!')), 20510: ValueError('direction of RA-node 666105 is ambiguous!')), 20766: ValueError('direction of RA-node 677366 is ambiguous!')) 20844: ValueError('direction of RA-node 682026 is ambiguous!')), 20888: ValueError('direction of RA-node 684790 is ambiguous!')), 20992: ValueError('direction of RA-node 689965 is ambiguous!')), 21401: ValueError('direction of RA-node 714397 is ambiguous!')), 21477: ValueError('direction of RA-node 720124 is ambiguous!')), 21588: ValueError('direction of RA-node 724367 is ambiguous!')), 23114: ValueError('direction of RA-node 820629 is ambiguous!')), 23766: ValueError('direction of RA-node 864140 is ambiguous!')), 23891: ValueError('direction of RA-node 870484 is ambiguous!')), 19911: ValueError('I-node texts are not unique!')), ```
tanikina commented 5 months ago

I-node alignment warnings

In short, these are mostly dataset issues. Several cases fail because there are isolated L-nodes that cannot be used for I2L node alignment.

Why do we have such cases while using l_node_ids_with_isolates in create_relation_nodes.py?

Note that we remove completely isolated nodes before calling remove_isolated_nodes() at the very beginning with cleanup_nodeset() because here we collect only nodes that appear in valid relations and if some node is completely isolated there are no valid relations for it, so we simply delete such cases from the nodeset here.

Nodeset 21083

Here we fail to align I-node 695718 {'nodeID': '695718', 'text': 'Lynne Unknown is disappointed', 'type': 'I', 'timestamp': '2021-05-04 15:42:35'}. Note that we have two two I-nodes with identical text: {'nodeID': '695718', 'text': 'Lynne Unknown is disappointed', 'type': 'I', 'timestamp': '2021-05-04 15:42:35'} {'nodeID': '695724', 'text': 'Lyne Unknown is disappointed', 'type': 'I', 'timestamp': '2021-05-04 15:42:37'} And there are two matching L-nodes BUT one of them (ID 695719) is completely disconnected w/o any incoming or outgoing edges, hence it's removed from the nodeset (as explained above). Connected L-node: {"nodeID":"695716","text":"Lynne Unknown : I'm disappointed","type":"L","timestamp":"2021-05-04 15:42:34"}, Disconnected L-node: {"nodeID":"695719","text":"Mascha: Lynne Unknown : I'm disappointed","type":"L","timestamp":"2021-05-04 15:42:36"} I assume that I-695718 should be aligned to L-695719 but because L-695719 is isolated this does not happen.

Nodeset 18888

Here we have non-aligned I-node 572747: {'nodeID': '572747', 'text': 'when James Cleverly was first elected onto the London Assembly, he sat next to an elected member of the British National Party an explicitly racist political party, elected into London government by the voters of London', 'type': 'I', 'timestamp': '2020-09-02 20:56:15'}. It should be aligned to L-node 572743 but that node is already taken (aligned to I-node 572757) and I-node 572757 should be aligned to L-572735 BUT L-572735 has multiple sources & targets that we don't consider when using `linear_sum_assignment` because we assume that a single I-node can be aligned to only one L-node. Hence, some of the alignments are missing (if there are multiple L-I alignments in the original data). BTW, this nodeset breaks anyway with the message: `ValueError: nodeset=18888: S-node arguments are not unique! missing relations: [{'sources': ['572739'], 'targets': ['572712'], 'relation': '572841'}]`

Nodeset 17938

Here we cannot align {'nodeID': '512730', 'text': 'the peak of the virus follows the lockdown', 'type': 'I', 'timestamp': '2020-05-28 20:27:26'}. According to OVA-visualization, it should be aligned to L-node {"nodeID":"512731","text":"Louise: Fiona Bruce : The peak follows the lockdown","type":"L","timestamp":"2020-05-28 20:27:26"} BUT this node is completely disconnected (again, we remove it because there are no edges/valid relations as mentioned above).

Nodeset 23701

Here we have non-aligned I-node 810466: {'nodeID': '810466', 'text': 'taking 20 pounds a week away from the poorest in our society is nasty', 'type': 'I', 'timestamp': '2022-01-06 15:13:20'}. According to OVA, it should be aligned to the L-node {"nodeID":"810464","text":"David Lammy : it's nasty","type":"L","timestamp":"2022-01-06 15:13:20"} but this node is already aligned to I-node 810544 "taking 20 pounds a week away from the poorest in our society is nasty" which seems to have exactly the same text except for an extra space in front of "nasty" :) OVA shows that there should be two L-nodes with the text "it's nasty" but we have **only one** such node in the given nodeset: {"nodeID":"810464","text":"David Lammy : it's nasty","type":"L","timestamp":"2022-01-06 15:13:20"} It seems that the other L-node is missing in the nodeset (but displayed in OVA?).

Nodeset 18484

Here I'm getting a different error message: `ValueError: nodeset=18484: could not determine direction of RA-nodes ['553464'] because there is no TA relation between any combination of anchoring I-nodes!` It does not seem to complain about missing I-node alignments. It just says `"Could not find anchor node for any argument of the RA-node 553464!"` which makes sense because this node has no targets, only sources (I guess we are missing the outgoing edges here): ![nodeset18484-ra-553464-no-targets](https://github.com/ArneBinder/dialam-2024-shared-task/assets/9082878/3c29e3cb-4d42-4c1a-b00a-639b9308960f)

Nodeset 19319

Here we have a problem with I-node {'nodeID': '604830', 'text': 'yes, Matthew is asking from personal experience', 'type': 'I', 'timestamp': '2020-12-10 19:48:25'}. According to OVA, it should be aligned to L-node with the text 'Matthew Unknown : Yes'. Note that we have two I-nodes (I-604833 and I-604830) that should be aligned to two different L-nodes with the same text 'Matthew Unknown : Yes' (as displayed in OVA) but there is **only one** such L-node (L-604828) in the given nodeset, so if we align one I-node to this L-node the other one is left w/o the correct alignment! There is a suspiciously similar looking L-node that could be used for the alignment {"nodeID":"604815","text":"Nicole: Matthew Unknown : Yes","type":"L","timestamp":"2020-12-10 19:48:21"} but it is completely isolated, so we miss it anyway after calling `cleanup_nodeset()`.