Open ArneBinder opened 4 months ago
Some examples for the node types based on nodeset17940.json
from the training set.
AIFdb visualization for the corresponding argument map: http://www.aifdb.org/argview/17940.
L-node is used for locutions (speaker + what they actually said):
{
"nodeID": "512946",
"text": "Camilla Tominey : that's not something we want",
"type": "L",
"timestamp": "2020-05-28 20:31:10"
},
I-node (information-node) is used with propositions (propositions are "reconstructed locutions, where linguistic features like anaphora, pronouns, and deixis are resolved" see annotations):
{
"nodeID": "512948",
"text": "risking the spread of COVID-19 is not something we want",
"type": "I",
"timestamp": "2020-05-28 20:31:10"
},
YA-node connects several types of nodes:
Here we have an edge connecting YA-node-"512947" with L-node-"512946".
{
"nodeID": "512947",
"text": "Asserting",
"type": "YA",
"timestamp": "2020-05-28 20:31:10",
"scheme": "Asserting",
"schemeID": "74"
},
S-node connects I-nodes and can have the following "type": "RA" stands for inference, "CA" for conflict and "MA" for rephrase. Note that we also have an edge connecting S-node-"512950" that has the type annotation and the corresponding I-node-"512948" (shown above).
{
"nodeID": "512950",
"text": "Default Inference",
"type": "RA",
"timestamp": "2020-05-28 20:31:11",
"scheme": "Default Inference",
"schemeID": "72"
},
Having an edge between the S-node-"512950" (above) and another I-node-"512944" (shown below) means that there is an inference relation between the two propositions: "risking the spread of COVID-19 is not something we want" and "there is a risk to children of perhaps contracting COVID-19 and spreading it to vulnerable adults". One statement supports and provides the reason for another, hence it is annotated as "inference".
{
"nodeID": "512944",
"text": "there is a risk to children of perhaps contracting COVID-19 and spreading it to vulnerable adults",
"type": "I",
"timestamp": "2020-05-28 20:31:10"
},
In general, I and L are content nodes with the text for propositions and locutions, respectively, and they are given as a set of nodes at the test time. TA nodes are transitions between the L nodes in dialogue and they are also always given. The task is about identifying the YA and S nodes with the relation annotations that either connect I and L nodes (YA-nodes) or connect two I nodes (S-nodes).
In the test dataset all the information provided will be the set of unlinked I-nodes and a set of L-nodes linked by transitions (TA-nodes)."
The task definition as specified in the Shared Task guidelines:
The goal in the DialAM task is to correctly detect illocutionary relations (YA-nodes) and propositional relations (RA-, CA-, and MA-nodes), producing an edited argument map containing these new identified relational nodes together with new edges linking them to the locutions (L-nodes) and the argumentative propositions (I-nodes).
The main goal of the DialAM task is therefore twofold: First, to identify the existing relational nodes (RA-, CA-, MA-nodes) between propositions (I-nodes) and generate the respective edges linking all the information in the argument map. Similarly, the second goal is to identify any existing illocutionary relations (YA-nodes) between locutions (L-nodes) and propositions (I-nodes).
Task Baseline? Transformer-Based Models for Automatic Identification of Argument Relations: A Cross-Domain Evaluation
Open Questions:
* [ ] Do we need to do relation link prediction or just the link classification?
Yes, we need to do both (edge prediction and node type classification). I asked in the Shared Task Slack channel and here is the reply from the organizers:
assumptions:
Yes, this is also my understanding! Additionally, according to the annotation details document, we also need to classify YA relations between TA and S-nodes (TA-Node_i, S-Node_i) as well as between TA and I-nodes. I'm not sure about the TA → I transitions though since I have not seen any examples so far.
EDIT: There are no TA → I transitions in the training data and direct TA → S transitions are very rare. However, TA → YA → S transitions are quite important (see the node2node transition table in the next comment).
Also, regarding the propositional relations, I think we can safely assume that MA and CA only go up (I-Node_j, I-Node_i) and RA can point both up (I-Node_j, I-Node_i) or down (I-Node_i, I-Node_j). At least that's how they specify them in the annotation details document.
I can check the training data and compile some statistics for each of the relation types (e.g., how many times we have each relation and which nodes are involved). Would that be useful?
I'm still not sure whether this is insightful but here is some statistics based on the node2node transitions from the training set. The table was generated using the count_statistics.py script (format: label-count
for each valid transition/edge).
It seems that the most important/common transitions are between the following nodes:
to_node → from_node ↓ | YA | L | TA | I | MA | RA | CA |
---|---|---|---|---|---|---|---|
YA | - | Asserting-420 Analysing-255 PureQuestioning-7 DefaultIllocuting-6 AssertiveQuestioning-5 Arguing-3 Agreeing-2 Restating-1 Challenging-1 | Arguing-1 | Asserting-18780 PureQuestioning-1185 AssertiveQuestioning-239 RhetoricalQuestioning-222 Agreeing-215 NoLabel-160 DefaultIllocuting-136 Challenging-57 Disagreeing-50 Arguing-12 Restating-5 | Restating-4056 NoLabel-1097 DefaultIllocuting-614 Arguing-12 Agreeing-6 Disagreeing-3 Asserting-1 | Arguing-5067 NoLabel-394 DefaultIllocuting-63 Asserting-22 Restating-20 Agreeing-17 PureQuestioning-10 RhetoricalQuestioning-2 AssertiveQuestioning-1 Challenging-1 Disagreeing-1 | Disagreeing-931 NoLabel-234 Challenging-39 Arguing-8 Restating-7 DefaultIllocuting-5 |
L | Asserting-19195 PureQuestioning-1192 Analysing-256 AssertiveQuestioning-244 RhetoricalQuestioning-222 DefaultIllocuting-139 Agreeing-109 Challenging-41 Disagreeing-21 Arguing-15 Restating-6 NoLabel-3 DefaultTransition-2 DefaultRephrase-1 | NoLabel-7 DefaultTransition-2 DefaultInference-2 | DefaultTransition-20173 NoLabel-2857 DefaultRephrase-1 Asserting-1 | Disagreeing-1 DefaultRephrase-1 | DefaultRephrase-6 | DefaultInference-7 | DefaultConflict-3 |
TA | DefaultTransition-11050 NoLabel-1884 | DefaultTransition-20178 NoLabel-2857 | - | - | DefaultTransition-5 | DefaultTransition-1 | DefaultTransition-1 |
I | Asserting-32 PureQuestioning-1 | DefaultTransition-2 DefaultRephrase-1 DefaultIllocuting-1 | - | DefaultConflict-1 | DefaultRephrase-4732 NoLabel-1071 DefaultTransition-1 | DefaultInference-6116 NoLabel-386 Arguing-1 DefaultConflict-1 | DefaultConflict-997 NoLabel-229 Arguing-1 DefaultIllocuting-1 |
MA | - | DefaultRephrase-12 | - | DefaultRephrase-4730 NoLabel-1077 | DefaultRephrase-5 | - | - |
RA | - | DefaultInference-8 | - | DefaultInference-5282 NoLabel-379 | - | - | - |
CA | - | DefaultConflict-4 | - | DefaultConflict-992 NoLabel-230 | - | - | DefaultConflict-1 |
Oh, this is very interesting! However, I do not fully understand the column / row sets (YA
, L
, TA
, I
, MA
, RA
, CA
). I would expect L
, I
, S
, TA
instead because that are the types of relation arguments and in the end, we would classify these pairs (if i understand it correctly). Or what was the reasoning behind your choice? Maybe I missed sth.
Another note: If it is not much effort, can we have the table in markdown? I think pandas dataframes provide a to_markdown
method. But if that does not work out of the box, I think it is fine to keep it as it is.
However, I do not fully understand the column / row sets (YA, L, TA, I, MA, RA, CA). I would expect L, I, S, TA instead because that are the types of relation arguments and in the end, we would classify these pairs (if i understand it correctly). Or what was the reasoning behind your choice?
Here I just collected all possible/valid transitions and their statistics (including the edges that we don't need to predict). S nodes are basically represented as MA, RA and CA nodes in the data (they are no "S" nodes in the original dataset) and since they have different labels and participate in different transitions, I think it might be useful to keep them in separate rows/columns. We also need YA nodes because we have to predict/annotate them in the following transitions: L → YA → I and TA → YA → S (at least as far as I understand the task).
Another note: If it is not much effort, can we have the table in markdown? I think pandas dataframes provide a to_markdown method.
Sure, no problem! Now we have it in markdown :)
This is a new table with the statistics for the input nodes (computed with this script). We are given L, I and TA nodes as input and need to predict the following transitions (i.e., whether there is a link between the two input nodes and which type/"scheme" should be assigned to it):
YA nodes basically serve as "edge labels" in this task since we don't have any edge labels in the data, only the node labels. S nodes should be predicted based on the I → S → I transitions.
input nodes | L | I | S | TA |
---|---|---|---|---|
L | L → TA → L DefaultTransition: 20206 NoLabel: 2857 L → YA → L Asserting: 420 Analysing: 255 PureQuestioning: 7 DefaultIllocuting: 6 AssertiveQuestioning: 5 Arguing: 3 Agreeing: 2 Restating: 1 Challenging: 1 L → MA → L DefaultRephrase: 2 |
L → YA → I Asserting: 18779 PureQuestioning: 1185 AssertiveQuestioning: 239 RhetoricalQuestioning: 222 DefaultIllocuting: 133 Agreeing: 107 Challenging: 40 Disagreeing: 21 Arguing: 12 Restating: 5 NoLabel: 3 L → RA → I DefaultInference: 6 L → MA → I DefaultRephrase: 4 L → CA → I DefaultConflict: 3 |
L → TA → S DefaultTransition: 7 |
- |
I | I → MA → L DefaultRephrase: 10 I → RA → L DefaultInference: 8 I → CA → L DefaultConflict: 4 |
I → RA → I DefaultInference: 6117 NoLabel: 371 I → MA → I DefaultRephrase: 4730 NoLabel: 1053 I → CA → I DefaultConflict: 995 NoLabel: 226 I → YA → I Asserting: 32 PureQuestioning: 1 |
- | - |
S | - | - | - | - |
TA | - | TA → YA → I NoLabel: 157 Agreeing: 109 Disagreeing: 29 Challenging: 17 DefaultIllocuting: 3 Asserting: 2 PureQuestioning: 1 |
TA → YA → S Arguing: 5090 Restating: 4083 NoLabel: 1725 Disagreeing: 935 DefaultIllocuting: 682 Challenging: 40 Asserting: 23 Agreeing: 23 PureQuestioning: 10 RhetoricalQuestioning: 2 AssertiveQuestioning: 1 TA → MA → S DefaultRephrase: 5 |
- |
I have also created some code to do statistics. I added the code to the same script, but you can just comment out the last lines to bring it back to the previous state. However, it results in the following (edited: new version with relation node types and counts sorted by identifier):
I | L | S | TA | YA | |
---|---|---|---|---|---|
I | S/DefaultConflict: 1221 S/DefaultInference: 6488 S/DefaultRephrase: 5783 YA/Asserting: 32 YA/PureQuestioning: 1 |
S/DefaultConflict: 4 S/DefaultInference: 8 S/DefaultRephrase: 10 |
- | - | - |
L | S/DefaultConflict: 3 S/DefaultInference: 6 S/DefaultRephrase: 4 YA/Agreeing: 107 YA/Arguing: 12 YA/Asserting: 18782 YA/AssertiveQuestioning: 239 YA/Challenging: 40 YA/DefaultIllocuting: 133 YA/Disagreeing: 21 YA/PureQuestioning: 1185 YA/Restating: 5 YA/RhetoricalQuestioning: 222 |
S/DefaultInference: 1 S/DefaultRephrase: 2 TA/DefaultTransition: 23063 YA/Agreeing: 2 YA/Analysing: 255 YA/Arguing: 3 YA/Asserting: 420 YA/AssertiveQuestioning: 5 YA/Challenging: 1 YA/DefaultIllocuting: 6 YA/PureQuestioning: 7 YA/Restating: 1 |
TA/DefaultTransition: 7 | - | TA/DefaultTransition: 12959 |
S | S/DefaultConflict: 1 S/DefaultRephrase: 5 |
- | - | - | - |
TA | S/DefaultInference: 1 YA/Agreeing: 209 YA/Asserting: 2 YA/Challenging: 43 YA/DefaultIllocuting: 3 YA/Disagreeing: 60 YA/PureQuestioning: 1 |
- | S/DefaultConflict: 1 S/DefaultRephrase: 5 YA/Agreeing: 23 YA/Arguing: 5484 YA/Asserting: 23 YA/AssertiveQuestioning: 1 YA/Challenging: 40 YA/DefaultIllocuting: 1779 YA/Disagreeing: 1169 YA/PureQuestioning: 10 YA/Restating: 4083 YA/RhetoricalQuestioning: 2 |
YA/Arguing: 1 | - |
YA | S/DefaultConflict: 1216 S/DefaultInference: 5581 S/DefaultRephrase: 5765 |
S/DefaultConflict: 4 S/DefaultInference: 8 S/DefaultRephrase: 11 TA/DefaultTransition: 1 |
- | - | TA/DefaultTransition: 1 |
Unfortunately, this seems to be different than the above table, but it should'nt... I guess?
You are right, of course! I had the "No Label" labels in the table which doesn't make much sense. I updated the code and now it seems to generate the same numbers. Thanks a lot for checking and implementing another version! I think we should keep your version as a reference :)
input nodes | I | L | S | TA | YA |
---|---|---|---|---|---|
I | I → S → I DefaultInference: 6488 DefaultRephrase: 5783 DefaultConflict: 1221 I → YA → I Asserting: 32 PureQuestioning: 1 |
I → S → L DefaultRephrase: 10 DefaultInference: 8 DefaultConflict: 4 |
- | - | - |
L | L → YA → I Asserting: 18782 PureQuestioning: 1185 AssertiveQuestioning: 239 RhetoricalQuestioning: 222 DefaultIllocuting: 133 Agreeing: 107 Challenging: 40 Disagreeing: 21 Arguing: 12 Restating: 5 L → S → I DefaultInference: 6 DefaultRephrase: 4 DefaultConflict: 3 |
L → TA → L DefaultTransition: 23063 L → YA → L Asserting: 420 Analysing: 255 PureQuestioning: 7 DefaultIllocuting: 6 AssertiveQuestioning: 5 Arguing: 3 Agreeing: 2 Restating: 1 Challenging: 1 L → S → L DefaultRephrase: 2 DefaultInference: 1 |
L → TA → S DefaultTransition: 7 |
- | L → TA → YA DefaultTransition: 12959 |
S | S → S → I DefaultRephrase: 5 DefaultConflict: 1 |
- | - | - | - |
TA | TA → YA → I Agreeing: 209 Disagreeing: 60 Challenging: 43 DefaultIllocuting: 3 Asserting: 2 PureQuestioning: 1 TA → S → I DefaultInference: 1 |
- | TA → YA → S Arguing: 5484 Restating: 4083 DefaultIllocuting: 1779 Disagreeing: 1169 Challenging: 40 Asserting: 23 Agreeing: 23 PureQuestioning: 10 RhetoricalQuestioning: 2 AssertiveQuestioning: 1 TA → S → S DefaultRephrase: 5 DefaultConflict: 1 |
TA → YA → TA Arguing: 1 |
- |
YA | YA → S → I DefaultRephrase: 5765 DefaultInference: 5581 DefaultConflict: 1216 |
YA → S → L DefaultRephrase: 11 DefaultInference: 8 DefaultConflict: 4 YA → TA → L DefaultTransition: 1 |
- | - | YA → TA → YA DefaultTransition: 1 |
timestamp
in the case of each node type (i.e. I, L, TA, YA, S)?Question 1: What is the meaning of the timestamp
in the case of each node type (i.e. I, L, TA, YA, S)?
Answer: This questions has not been yet answered by the organizers but my understanding is that L and I nodes are more-or-less aligned according to their timestamps which correspond to the original dialogue flow and other timestamps (for TA, YA, S nodes) look quite random and maybe show an order in which they were annotated (L and I node always have the same date based on the BBC broadcast, e.g., 2020-11-19 but other nodes have much later dates, e.g., 2022-06-24).
Questions 2 & 4: (Q2) It looks like there are disconnected L-nodes, what does this mean? See nodes 599519, 599523, 599527, 599534, 599537 in nodeset25524. (Q4) Why do we have duplicate nodes (861680 and 861681 even have the same timestamp)? Answer: I’ve just checked and it looks like some of the nodes are duplicated (e.g., 599516 is the same as 599519). The duplicated ones are isolated, and they should be not considered in your analysis as they will not be considered during evaluation since they are redundant.
Question 3: There are S-nodes that are not linked by any YA node to a TA node, but the IAT guidelines, Sec. 5, Connections with propositional relations, state that "All RAs, CAs and MAs must be anchored through ICs in TAs". See node 1021263 in nodeset25524. Answer: Yes, such cases are possible, e.g., when we have a rephrase between two propositions, and a linked argument between one of these propositions together with a third. See the image below where we have "Default Inference" node between "if one had never..." and "look at the potential side effects..." that is not anchored via TA node.
Question 5: Is it correct to have multiple relation nodes that connect a pair of nodes? Does this mean, this is a multi label relation classification task (instead of just multi-class)? see nodeset25461: Multiple relation nodes (types: {'TA'}) between 720315 (type: L) and 719763 (type: L), or see nodeset25524: Multiple relation nodes (types: {'RA', 'MA'}) between 599605 (type: I) and 599599 (type: I) Answer: The case that you mention is one where there is a rephrase between two propositions, and a linked argument between one of these propositions together with a third (see the image above). Therefore, yes, it is possible to have multiple relation nodes between the same pair of nodes but is not as simple as a multi-label classification problem. In the proposed example, the inference relation only happens because there is a third proposition involved, if not, there would only have been a rephrase relation between two propositions.
Note: we can remove all disconnected nodes from the training data! As for multiple relations (Q5), I am not sure how feasible this is from the modelling point of view. Based on our statistics there are not many cases that have multiple relations between the same I nodes. There are plenty of cases with multiple relations between L nodes (multiple TA relations) but I guess we can simply assume that those are given at test time.
Unfortunately, we cannot rely on the timestamps of I and L-nodes since they can be arbitrary and just show when the nodes were added to the graph with some external annotation tool (at least that's how I understood the response from the organizers). The only reliable timestamps are those that are specified under locutions
and they are available only for some L-nodes (e.g., L-nodes 70682, 706806, 706835 in nodeset 21303 are missing in locutions
).
I prepared a script to test different "automatic" ways of aligning the I and L-nodes based on the embedding similarity, token overlap etc. It is currently on the branch called node_alignment_experiments
: align_i2l_nodes.py Some parts were copied/adopted from the visualize_arg_map.py script :) See here for the results.
PS: We also need to think about the train/dev/test splits since those are not given out-of-the-box. The organizers uploaded some examples for the test data format but they include only three nodesets: http://dialam.arg.tech/res/files/sample_test.zip
TODOs:
-rev
to type) and respective relation edges (swap direction)Open questions:
NONE
type (see EDIT below). But on what text do we operate? since we align I with L nodes, we could, in theory, frame the S-node classification as TA-node classification... Pro argument for operating on the propositions: the text may have better prepared information; pro argument for operating on the locutions: more close to real world text on what the language model that we may use is pretrained on cleanup_data.py
. What is the meaning of such relations? Here we have some statistics for all the transitions that occur in the dataset, in total we have 318 TA → YA → I transitions.EDIT: Feedback from discussion with Leo:
NONE
class for S nodes is also relevant! there are TA nodes that do not have a respective S node, but we mirror all of the TA nodes as potential S nodes. To create correct training data, we need to create new S nodes with type NONE
by mirroring the TA nodes that do not already have an S nodeRA
-nodes (default inference
) can have multiple incoming / outgoing edgesEDIT2: binarizing relations seems to be not so easily possible because that can create multiple relations between the same pair of nodes, see this example that Ramon presented in the slack channel (look at the two top I-nodes, binarizing would create two relations between them, one with label default inference
and the other with default rephrase
):
From the task website:
QT30 corpus: http://corpora.aifdb.org/qt30
Open Questions: