Closed ArneBinder closed 2 months ago
We used this code to train our first model for YA-I2L relations with bert-base-uncased
(default configuration, only the batch size was reduced from 32 to 8).
Here are the results on the validation set with micro-F1 0.94 and macro-F1 0.34 (note that some classes occur only rarely in the validation split, e.g., Agreeing-1 or Challenging-2):
Results for the classes that appear more than 10 times in the validation set: | class | support | f1 |
---|---|---|---|
Asserting | 1704 | 0.976 | |
AssertiveQuestioning | 19 | 0.294 | |
NONE | 39 | 0.260 | |
PureQuestioning | 111 | 0.744 | |
RhetoricalQuestioning | 18 | 0.450 |
TODO: in logs we are getting warnings that need to be checked (probably related to offset computation):
[pie_modules.taskmodules.re_text_classification_with_indices][WARNING] - doc.id=25497: Skipping invalid example, cannot get argument token slices for {LabeledSpan(start=51, end=185, label='L', score=1.0): "Michelle O'Neill : I would remind viewers that the majority of people and elected representatives in the north are opposed to Brexit.\xa0"}
This adds configs to train a simple baseline model on the YA-I2L data from the DialAM-2024 dataset. Please have a look into the relevant configs (in the respective
config
subfolders):dataset
:dialam2024_base
(pure pie-document dataset) anddialam2024_prepared
(+validation split)model
:sequence_classification_with_pooler
(the trainable Pytorch-Lightning model), see pie-modules for further informationtaskmodule
:re_text_classification_with_indices
(conversion of documents to model inputs and back), see pie-modules for further informationexperiment
:dialam2024_ya_i2l
(sticking everything together)Approach: Since we pre-calculate the alignment of I- and L-nodes, we can frame the task of YA-I2L-relation extraction as unary-relation extraction where each L-node participates in exactly one relation.
Execute a fast dev run (one batch only):
train on the GPU:
Notes:
pytorch-ie
,pie-modules
, andpie-datasets
and reinstall viapip install -r requirements.txt