ArneBinder / dialam-2024-shared-task

see http://dialam.arg.tech/
0 stars 0 forks source link

understanding the data & task #1

Open ArneBinder opened 4 months ago

ArneBinder commented 4 months ago

From the task website:

We will use the QT30 corpus [1], the largest available corpus in dialogical argumentation in English. QT30 is a collection of 30 episodes of Question Time aired between June 2020 and November 2021, with a total of more than 29 hours of transcribed broadcast material and comprises of 19,842 locutions by more than 400 participants: one moderator, 125 panel members (7 of them appearing more than once), and 300+ audience members. The QT30 dataset contains 10,818 propositional relations divided into Default Inferences, Default Conflicts, and Default Rephrases, and 32,303 illocutionary relations divided into Asserting, Agreeing, Arguing, Disagreeing, Restating, Questioning, and Default Illocuting.

QT30 corpus: http://corpora.aifdb.org/qt30

Open Questions:

tanikina commented 4 months ago

Node Type Annotations

Some examples for the node types based on nodeset17940.json from the training set.

AIFdb visualization for the corresponding argument map: http://www.aifdb.org/argview/17940.

L-node is used for locutions (speaker + what they actually said):

      {
            "nodeID": "512946",
            "text": "Camilla Tominey : that's not something we want",
            "type": "L",
            "timestamp": "2020-05-28 20:31:10"
      },

I-node (information-node) is used with propositions (propositions are "reconstructed locutions, where linguistic features like anaphora, pronouns, and deixis are resolved" see annotations):

    {
          "nodeID": "512948",
          "text": "risking the spread of COVID-19 is not something we want",
          "type": "I",
          "timestamp": "2020-05-28 20:31:10"
    },

YA-node connects several types of nodes:

Here we have an edge connecting YA-node-"512947" with L-node-"512946".

      {
            "nodeID": "512947",
            "text": "Asserting",
            "type": "YA",
            "timestamp": "2020-05-28 20:31:10",
            "scheme": "Asserting",
            "schemeID": "74"
      },

S-node connects I-nodes and can have the following "type": "RA" stands for inference, "CA" for conflict and "MA" for rephrase. Note that we also have an edge connecting S-node-"512950" that has the type annotation and the corresponding I-node-"512948" (shown above).

      {
            "nodeID": "512950",
            "text": "Default Inference",
            "type": "RA",
            "timestamp": "2020-05-28 20:31:11",
            "scheme": "Default Inference",
            "schemeID": "72"
      },

Having an edge between the S-node-"512950" (above) and another I-node-"512944" (shown below) means that there is an inference relation between the two propositions: "risking the spread of COVID-19 is not something we want" and "there is a risk to children of perhaps contracting COVID-19 and spreading it to vulnerable adults". One statement supports and provides the reason for another, hence it is annotated as "inference".

      {
            "nodeID": "512944",
            "text": "there is a risk to children of perhaps contracting COVID-19 and spreading it to vulnerable adults",
            "type": "I",
            "timestamp": "2020-05-28 20:31:10"
      },

In general, I and L are content nodes with the text for propositions and locutions, respectively, and they are given as a set of nodes at the test time. TA nodes are transitions between the L nodes in dialogue and they are also always given. The task is about identifying the YA and S nodes with the relation annotations that either connect I and L nodes (YA-nodes) or connect two I nodes (S-nodes).

In the test dataset all the information provided will be the set of unlinked I-nodes and a set of L-nodes linked by transitions (TA-nodes)."

The task definition as specified in the Shared Task guidelines:

The goal in the DialAM task is to correctly detect illocutionary relations (YA-nodes) and propositional relations (RA-, CA-, and MA-nodes), producing an edited argument map containing these new identified relational nodes together with new edges linking them to the locutions (L-nodes) and the argumentative propositions (I-nodes).

The main goal of the DialAM task is therefore twofold: First, to identify the existing relational nodes (RA-, CA-, MA-nodes) between propositions (I-nodes) and generate the respective edges linking all the information in the argument map. Similarly, the second goal is to identify any existing illocutionary relations (YA-nodes) between locutions (L-nodes) and propositions (I-nodes).

image

Useful Resources:

Data Format

Annotation Details

QT30 Paper

IAT Guidelines

Task Baseline? Transformer-Based Models for Automatic Identification of Argument Relations: A Cross-Domain Evaluation

tanikina commented 4 months ago

Open Questions:

* [ ]  Do we need to do relation link prediction or just the link classification?

Yes, we need to do both (edge prediction and node type classification). I asked in the Shared Task Slack channel and here is the reply from the organizers:

Screenshot from 2024-03-04 16-31-07

ArneBinder commented 4 months ago

assumptions:

  1. The number of I-nodes is the same as of the L-nodes.
  2. We can order both by timestamps and the resulting pairings, (L-Node_i, I-Node_i), are positive illocutionary relation instances, called YA-Nodes, but we don't know there class.
  3. There are transition relations between L-nodes, i.e. (L-Node_i, L-Node_j), with i<j, called TA-nodes, and they are given. For each such relation, there may be a propositional relation (I-Node_j, I-Node_i) or (exclusive?) (I-Node_i, I-Node_j) called S-node of class RA, MA, or CA (from the IAT guidelines, Sec. 5, Connections with propositional relations: "All RAs, CAs and MAs must be anchored through ICs in TAs."). Simplified, the "propositional relation" (I-Node_j, I-Node_i) is of class RA, MA, CA, RA-rev, MA-rev, CA-rev, or NONE, where the suffix -rev indicates the respective reversed relations.
  4. Furthermore, for any positive (not NONE class) propositional relation (I-Node_i, I-Node_j) there needs to be a YA relation ((L-Node_i, L-Node_j), (I-Node_i, I-Node_j)) with class xxx, yyy, or zzz.
  5. However, there may be more YA relations! From the IAT guidelines, Sec. 5, Basics: "Each locution will typically anchor a single illocutionary connection, but may anchor more than one"
tanikina commented 4 months ago

Yes, this is also my understanding! Additionally, according to the annotation details document, we also need to classify YA relations between TA and S-nodes (TA-Node_i, S-Node_i) as well as between TA and I-nodes. I'm not sure about the TA → I transitions though since I have not seen any examples so far.

EDIT: There are no TA → I transitions in the training data and direct TA → S transitions are very rare. However, TA → YA → S transitions are quite important (see the node2node transition table in the next comment).

Also, regarding the propositional relations, I think we can safely assume that MA and CA only go up (I-Node_j, I-Node_i) and RA can point both up (I-Node_j, I-Node_i) or down (I-Node_i, I-Node_j). At least that's how they specify them in the annotation details document.

I can check the training data and compile some statistics for each of the relation types (e.g., how many times we have each relation and which nodes are involved). Would that be useful?

tanikina commented 3 months ago

I'm still not sure whether this is insightful but here is some statistics based on the node2node transitions from the training set. The table was generated using the count_statistics.py script (format: label-count for each valid transition/edge).

It seems that the most important/common transitions are between the following nodes:

to_node → from_node ↓ YA L TA I MA RA CA
YA - Asserting-420 Analysing-255 PureQuestioning-7 DefaultIllocuting-6 AssertiveQuestioning-5 Arguing-3 Agreeing-2 Restating-1 Challenging-1 Arguing-1 Asserting-18780 PureQuestioning-1185 AssertiveQuestioning-239 RhetoricalQuestioning-222 Agreeing-215 NoLabel-160 DefaultIllocuting-136 Challenging-57 Disagreeing-50 Arguing-12 Restating-5 Restating-4056 NoLabel-1097 DefaultIllocuting-614 Arguing-12 Agreeing-6 Disagreeing-3 Asserting-1 Arguing-5067 NoLabel-394 DefaultIllocuting-63 Asserting-22 Restating-20 Agreeing-17 PureQuestioning-10 RhetoricalQuestioning-2 AssertiveQuestioning-1 Challenging-1 Disagreeing-1 Disagreeing-931 NoLabel-234 Challenging-39 Arguing-8 Restating-7 DefaultIllocuting-5
L Asserting-19195 PureQuestioning-1192 Analysing-256 AssertiveQuestioning-244 RhetoricalQuestioning-222 DefaultIllocuting-139 Agreeing-109 Challenging-41 Disagreeing-21 Arguing-15 Restating-6 NoLabel-3 DefaultTransition-2 DefaultRephrase-1 NoLabel-7 DefaultTransition-2 DefaultInference-2 DefaultTransition-20173 NoLabel-2857 DefaultRephrase-1 Asserting-1 Disagreeing-1 DefaultRephrase-1 DefaultRephrase-6 DefaultInference-7 DefaultConflict-3
TA DefaultTransition-11050 NoLabel-1884 DefaultTransition-20178 NoLabel-2857 - - DefaultTransition-5 DefaultTransition-1 DefaultTransition-1
I Asserting-32 PureQuestioning-1 DefaultTransition-2 DefaultRephrase-1 DefaultIllocuting-1 - DefaultConflict-1 DefaultRephrase-4732 NoLabel-1071 DefaultTransition-1 DefaultInference-6116 NoLabel-386 Arguing-1 DefaultConflict-1 DefaultConflict-997 NoLabel-229 Arguing-1 DefaultIllocuting-1
MA - DefaultRephrase-12 - DefaultRephrase-4730 NoLabel-1077 DefaultRephrase-5 - -
RA - DefaultInference-8 - DefaultInference-5282 NoLabel-379 - - -
CA - DefaultConflict-4 - DefaultConflict-992 NoLabel-230 - - DefaultConflict-1
ArneBinder commented 3 months ago

Oh, this is very interesting! However, I do not fully understand the column / row sets (YA, L, TA, I, MA, RA, CA). I would expect L, I, S, TA instead because that are the types of relation arguments and in the end, we would classify these pairs (if i understand it correctly). Or what was the reasoning behind your choice? Maybe I missed sth.

Another note: If it is not much effort, can we have the table in markdown? I think pandas dataframes provide a to_markdown method. But if that does not work out of the box, I think it is fine to keep it as it is.

tanikina commented 3 months ago

However, I do not fully understand the column / row sets (YA, L, TA, I, MA, RA, CA). I would expect L, I, S, TA instead because that are the types of relation arguments and in the end, we would classify these pairs (if i understand it correctly). Or what was the reasoning behind your choice?

Here I just collected all possible/valid transitions and their statistics (including the edges that we don't need to predict). S nodes are basically represented as MA, RA and CA nodes in the data (they are no "S" nodes in the original dataset) and since they have different labels and participate in different transitions, I think it might be useful to keep them in separate rows/columns. We also need YA nodes because we have to predict/annotate them in the following transitions: L → YA → I and TA → YA → S (at least as far as I understand the task).

Another note: If it is not much effort, can we have the table in markdown? I think pandas dataframes provide a to_markdown method.

Sure, no problem! Now we have it in markdown :)

tanikina commented 3 months ago

This is a new table with the statistics for the input nodes (computed with this script). We are given L, I and TA nodes as input and need to predict the following transitions (i.e., whether there is a link between the two input nodes and which type/"scheme" should be assigned to it):

YA nodes basically serve as "edge labels" in this task since we don't have any edge labels in the data, only the node labels. S nodes should be predicted based on the I → S → I transitions.

input nodes L I S TA
L L → TA → L
DefaultTransition: 20206
NoLabel: 2857

L → YA → L
Asserting: 420
Analysing: 255
PureQuestioning: 7
DefaultIllocuting: 6
AssertiveQuestioning: 5
Arguing: 3
Agreeing: 2
Restating: 1
Challenging: 1

L → MA → L
DefaultRephrase: 2
L → YA → I
Asserting: 18779
PureQuestioning: 1185
AssertiveQuestioning: 239
RhetoricalQuestioning: 222
DefaultIllocuting: 133
Agreeing: 107
Challenging: 40
Disagreeing: 21
Arguing: 12
Restating: 5
NoLabel: 3

L → RA → I
DefaultInference: 6

L → MA → I
DefaultRephrase: 4

L → CA → I
DefaultConflict: 3
L → TA → S
DefaultTransition: 7
-
I I → MA → L
DefaultRephrase: 10

I → RA → L
DefaultInference: 8

I → CA → L
DefaultConflict: 4
I → RA → I
DefaultInference: 6117
NoLabel: 371

I → MA → I
DefaultRephrase: 4730
NoLabel: 1053

I → CA → I
DefaultConflict: 995
NoLabel: 226

I → YA → I
Asserting: 32
PureQuestioning: 1
- -
S - - - -
TA - TA → YA → I
NoLabel: 157
Agreeing: 109
Disagreeing: 29
Challenging: 17
DefaultIllocuting: 3
Asserting: 2
PureQuestioning: 1
TA → YA → S
Arguing: 5090
Restating: 4083
NoLabel: 1725
Disagreeing: 935
DefaultIllocuting: 682
Challenging: 40
Asserting: 23
Agreeing: 23
PureQuestioning: 10
RhetoricalQuestioning: 2
AssertiveQuestioning: 1

TA → MA → S
DefaultRephrase: 5
-
ArneBinder commented 3 months ago

I have also created some code to do statistics. I added the code to the same script, but you can just comment out the last lines to bring it back to the previous state. However, it results in the following (edited: new version with relation node types and counts sorted by identifier):

I L S TA YA
I S/DefaultConflict: 1221
S/DefaultInference: 6488
S/DefaultRephrase: 5783
YA/Asserting: 32
YA/PureQuestioning: 1
S/DefaultConflict: 4
S/DefaultInference: 8
S/DefaultRephrase: 10
- - -
L S/DefaultConflict: 3
S/DefaultInference: 6
S/DefaultRephrase: 4
YA/Agreeing: 107
YA/Arguing: 12
YA/Asserting: 18782
YA/AssertiveQuestioning: 239
YA/Challenging: 40
YA/DefaultIllocuting: 133
YA/Disagreeing: 21
YA/PureQuestioning: 1185
YA/Restating: 5
YA/RhetoricalQuestioning: 222
S/DefaultInference: 1
S/DefaultRephrase: 2
TA/DefaultTransition: 23063
YA/Agreeing: 2
YA/Analysing: 255
YA/Arguing: 3
YA/Asserting: 420
YA/AssertiveQuestioning: 5
YA/Challenging: 1
YA/DefaultIllocuting: 6
YA/PureQuestioning: 7
YA/Restating: 1
TA/DefaultTransition: 7 - TA/DefaultTransition: 12959
S S/DefaultConflict: 1
S/DefaultRephrase: 5
- - - -
TA S/DefaultInference: 1
YA/Agreeing: 209
YA/Asserting: 2
YA/Challenging: 43
YA/DefaultIllocuting: 3
YA/Disagreeing: 60
YA/PureQuestioning: 1
- S/DefaultConflict: 1
S/DefaultRephrase: 5
YA/Agreeing: 23
YA/Arguing: 5484
YA/Asserting: 23
YA/AssertiveQuestioning: 1
YA/Challenging: 40
YA/DefaultIllocuting: 1779
YA/Disagreeing: 1169
YA/PureQuestioning: 10
YA/Restating: 4083
YA/RhetoricalQuestioning: 2
YA/Arguing: 1 -
YA S/DefaultConflict: 1216
S/DefaultInference: 5581
S/DefaultRephrase: 5765
S/DefaultConflict: 4
S/DefaultInference: 8
S/DefaultRephrase: 11
TA/DefaultTransition: 1
- - TA/DefaultTransition: 1

Unfortunately, this seems to be different than the above table, but it should'nt... I guess?

tanikina commented 3 months ago

You are right, of course! I had the "No Label" labels in the table which doesn't make much sense. I updated the code and now it seems to generate the same numbers. Thanks a lot for checking and implementing another version! I think we should keep your version as a reference :)

input nodes I L S TA YA
I I → S → I
DefaultInference: 6488
DefaultRephrase: 5783
DefaultConflict: 1221

I → YA → I
Asserting: 32
PureQuestioning: 1
I → S → L
DefaultRephrase: 10
DefaultInference: 8
DefaultConflict: 4
- - -
L L → YA → I
Asserting: 18782
PureQuestioning: 1185
AssertiveQuestioning: 239
RhetoricalQuestioning: 222
DefaultIllocuting: 133
Agreeing: 107
Challenging: 40
Disagreeing: 21
Arguing: 12
Restating: 5

L → S → I
DefaultInference: 6
DefaultRephrase: 4
DefaultConflict: 3
L → TA → L
DefaultTransition: 23063

L → YA → L
Asserting: 420
Analysing: 255
PureQuestioning: 7
DefaultIllocuting: 6
AssertiveQuestioning: 5
Arguing: 3
Agreeing: 2
Restating: 1
Challenging: 1

L → S → L
DefaultRephrase: 2
DefaultInference: 1
L → TA → S
DefaultTransition: 7
- L → TA → YA
DefaultTransition: 12959
S S → S → I
DefaultRephrase: 5
DefaultConflict: 1
- - - -
TA TA → YA → I
Agreeing: 209
Disagreeing: 60
Challenging: 43
DefaultIllocuting: 3
Asserting: 2
PureQuestioning: 1

TA → S → I
DefaultInference: 1
- TA → YA → S
Arguing: 5484
Restating: 4083
DefaultIllocuting: 1779
Disagreeing: 1169
Challenging: 40
Asserting: 23
Agreeing: 23
PureQuestioning: 10
RhetoricalQuestioning: 2
AssertiveQuestioning: 1

TA → S → S
DefaultRephrase: 5
DefaultConflict: 1
TA → YA → TA
Arguing: 1
-
YA YA → S → I
DefaultRephrase: 5765
DefaultInference: 5581
DefaultConflict: 1216
YA → S → L
DefaultRephrase: 11
DefaultInference: 8
DefaultConflict: 4

YA → TA → L
DefaultTransition: 1
- - YA → TA → YA
DefaultTransition: 1
ArneBinder commented 3 months ago

Open questions regarding the data

  1. What is the meaning of the timestamp in the case of each node type (i.e. I, L, TA, YA, S)?
  2. It looks like there are disconnected L-nodes, what does this mean? See nodes 599519, 599523, 599527, 599534, 599537 in nodeset25524.
  3. There are S-nodes that are not linked by any YA node to a TA node, but the IAT guidelines, Sec. 5, Connections with propositional relations, state that "All RAs, CAs and MAs must be anchored through ICs in TAs". See node 1021263 in nodeset25524.
  4. Why do we have duplicate nodes (861680 and 861681 even have the same timestamp)? nodeset23720: {"nodeID":"861679","text":"Enter your text here...","type":"L","timestamp":"2022-01-14 10:49:33"} {"nodeID":"861680","text":"Enter your text here...","type":"L","timestamp":"2022-01-14 10:49:34"} {"nodeID":"861681","text":"Enter your text here...","type":"L","timestamp":"2022-01-14 10:49:34"}
  5. Is it correct to have multiple relation nodes that connect a pair of nodes? Does this mean, this is a multi label relation classification task (instead of just multi-class)? see nodeset25461: Multiple relation nodes (types: {'TA'}) between 720315 (type: L) and 719763 (type: L), or see nodeset25524: Multiple relation nodes (types: {'RA', 'MA'}) between 599605 (type: I) and 599599 (type: I)
tanikina commented 3 months ago

Answers from DialAM organizers

Question 1: What is the meaning of the timestamp in the case of each node type (i.e. I, L, TA, YA, S)? Answer: This questions has not been yet answered by the organizers but my understanding is that L and I nodes are more-or-less aligned according to their timestamps which correspond to the original dialogue flow and other timestamps (for TA, YA, S nodes) look quite random and maybe show an order in which they were annotated (L and I node always have the same date based on the BBC broadcast, e.g., 2020-11-19 but other nodes have much later dates, e.g., 2022-06-24).

Questions 2 & 4: (Q2) It looks like there are disconnected L-nodes, what does this mean? See nodes 599519, 599523, 599527, 599534, 599537 in nodeset25524. (Q4) Why do we have duplicate nodes (861680 and 861681 even have the same timestamp)? Answer: I’ve just checked and it looks like some of the nodes are duplicated (e.g., 599516 is the same as 599519). The duplicated ones are isolated, and they should be not considered in your analysis as they will not be considered during evaluation since they are redundant.

Question 3: There are S-nodes that are not linked by any YA node to a TA node, but the IAT guidelines, Sec. 5, Connections with propositional relations, state that "All RAs, CAs and MAs must be anchored through ICs in TAs". See node 1021263 in nodeset25524. Answer: Yes, such cases are possible, e.g., when we have a rephrase between two propositions, and a linked argument between one of these propositions together with a third. See the image below where we have "Default Inference" node between "if one had never..." and "look at the potential side effects..." that is not anchored via TA node.

image

Question 5: Is it correct to have multiple relation nodes that connect a pair of nodes? Does this mean, this is a multi label relation classification task (instead of just multi-class)? see nodeset25461: Multiple relation nodes (types: {'TA'}) between 720315 (type: L) and 719763 (type: L), or see nodeset25524: Multiple relation nodes (types: {'RA', 'MA'}) between 599605 (type: I) and 599599 (type: I) Answer: The case that you mention is one where there is a rephrase between two propositions, and a linked argument between one of these propositions together with a third (see the image above). Therefore, yes, it is possible to have multiple relation nodes between the same pair of nodes but is not as simple as a multi-label classification problem. In the proposed example, the inference relation only happens because there is a third proposition involved, if not, there would only have been a rephrase relation between two propositions.

Note: we can remove all disconnected nodes from the training data! As for multiple relations (Q5), I am not sure how feasible this is from the modelling point of view. Based on our statistics there are not many cases that have multiple relations between the same I nodes. There are plenty of cases with multiple relations between L nodes (multiple TA relations) but I guess we can simply assume that those are given at test time.

tanikina commented 3 months ago

Unfortunately, we cannot rely on the timestamps of I and L-nodes since they can be arbitrary and just show when the nodes were added to the graph with some external annotation tool (at least that's how I understood the response from the organizers). The only reliable timestamps are those that are specified under locutions and they are available only for some L-nodes (e.g., L-nodes 70682, 706806, 706835 in nodeset 21303 are missing in locutions).

I prepared a script to test different "automatic" ways of aligning the I and L-nodes based on the embedding similarity, token overlap etc. It is currently on the branch called node_alignment_experiments: align_i2l_nodes.py Some parts were copied/adopted from the visualize_arg_map.py script :) See here for the results.

PS: We also need to think about the train/dev/test splits since those are not given out-of-the-box. The organizers uploaded some examples for the test data format but they include only three nodesets: http://dialam.arg.tech/res/files/sample_test.zip

ArneBinder commented 3 months ago

TODOs:

Open questions:

EDIT: Feedback from discussion with Leo:

EDIT2: binarizing relations seems to be not so easily possible because that can create multiple relations between the same pair of nodes, see this example that Ramon presented in the slack channel (look at the two top I-nodes, binarizing would create two relations between them, one with label default inference and the other with default rephrase): Screenshot 2024-03-14 at 18 20 09