baseline training on `YA-I2L` data

This adds configs to train a simple baseline model on the YA-I2L data from the DialAM-2024 dataset. Please have a look into the relevant configs (in the respective config subfolders):

dataset: dialam2024_base (pure pie-document dataset) and dialam2024_prepared (+validation split)
model: sequence_classification_with_pooler (the trainable Pytorch-Lightning model), see pie-modules for further information
taskmodule: re_text_classification_with_indices (conversion of documents to model inputs and back), see pie-modules for further information
experiment: dialam2024_ya_i2l (sticking everything together)

Approach: Since we pre-calculate the alignment of I- and L-nodes, we can frame the task of YA-I2L-relation extraction as unary-relation extraction where each L-node participates in exactly one relation.

Execute a fast dev run (one batch only):

python src/train.py \
experiment=dialam2024_ya_i2l \
+trainer.fast_dev_run=true

train on the GPU:

python src/train.py \
experiment=dialam2024_ya_i2l \
trainer=gpu

Notes:

This requires #17.
This updates the dependencies, you may need to uninstall pytorch-ie, pie-modules, and pie-datasets and reinstall via pip install -r requirements.txt

Results for the classes that appear more than 10 times in the validation set:	class	support	f1
Asserting	1704	0.976
AssertiveQuestioning	19	0.294
NONE	39	0.260
PureQuestioning	111	0.744
RhetoricalQuestioning	18	0.450

Results for the classes that appear more than 10 times in the validation set:

class

support

Asserting

1704

0.976

AssertiveQuestioning

0.294

NONE

0.260

PureQuestioning

111

0.744

RhetoricalQuestioning

0.450

[pie_modules.taskmodules.re_text_classification_with_indices][WARNING] - doc.id=25497: Skipping invalid example, cannot get argument token slices for {LabeledSpan(start=51, end=185, label='L', score=1.0): "Michelle O'Neill : I would remind viewers that the majority of people and elected representatives in the north are opposed to Brexit.\xa0"}

ArneBinder / dialam-2024-shared-task

baseline training on `YA-I2L` data #16