Closed tanikina closed 1 month ago
value | support | |
---|---|---|
macro | 0.43 | 0 |
ya_s2ta_nodes:Pure Questioning | 0 | 0 |
ya_s2ta_nodes:Asserting | 0 | 0 |
ya_i2l_nodes:Restating | 0 | 0 |
ya_i2l_nodes:Arguing | 0 | 0 |
ya_s2ta_nodes:Rhetorical Questioning | 0 | 0 |
micro | 0.72 | 0 |
no_relation | 0 | 0 |
ya_i2l_nodes:Challenging | 0.5 | 1 |
ya_i2l_nodes:Agreeing | 0 | 2 |
ya_s2ta_nodes:Challenging | 0 | 2 |
ya_s2ta_nodes:Agreeing | 0 | 3 |
ya_i2l_nodes:Default Illocuting | 0 | 6 |
ya_i2l_nodes:Rhetorical Questioning | 0.32 | 13 |
ya_i2l_nodes:Assertive Questioning | 0.34 | 14 |
ya_i2l_nodes:NONE | 0.63 | 22 |
ya_s2ta_nodes:Default Illocuting | 0.62 | 53 |
ya_s2ta_nodes:Disagreeing | 0.32 | 90 |
s_nodes:Default Conflict | 0.34 | 92 |
ya_i2l_nodes:Pure Questioning | 0.81 | 120 |
s_nodes:Default Inference-rev | 0.35 | 211 |
s_nodes:Default Inference | 0.43 | 246 |
ya_s2ta_nodes:Restating | 0.52 | 385 |
ya_s2ta_nodes:Arguing | 0.48 | 444 |
s_nodes:Default Rephrase | 0.56 | 447 |
s_nodes:NONE | 0.7 | 862 |
ya_s2ta_nodes:NONE | 0.72 | 1005 |
ya_i2l_nodes:Asserting | 0.99 | 1795 |
python src/evaluation/eval_official.py --gold_dir=data/train --predictions_dir=data/validation_annotated_deberta_v3 --mode=arguments
general.p: 0.6643397895535604 general.r: 0.5891417270425421 general.f1: 0.6005552985712661 focused.p: 0.49017500285485893 focused.r: 0.31741342856450766 focused.f1: 0.36328749501622953
python3 src/evaluation/eval_official.py --gold_dir=data/train --predictions_dir=data/validation_annotated_deberta_v3 --mode=illocutions
general.p: 0.8622892315729807 general.r: 0.847089799405591 general.f1: 0.8511310912354281 focused.p: 0.717489532623461 focused.r: 0.6981894061035216 focused.f1: 0.7034717513399968
This adds the last batch of experiments with
dialam2024_merged_relations
config. The current best-performing model ismicrosoft/deberta-v3-large
. Below are the evaluation results on the validation set for different settings (including weighted loss, training with only 20 most frequent classes etc.):