ArneBinder / dialam-2024-shared-task

see http://dialam.arg.tech/
0 stars 0 forks source link

hardcode the validation split #33

Closed ArneBinder closed 2 months ago

ArneBinder commented 2 months ago

Even with a fixed value for dataset.create_validation_split.seed, it looks like the validation split is different from time to time in a not transparent way (@tanikina noticed differences when switching the base model, but not when switching other hyperparameters such as the learning rate...). With this PR, we define a fixed validation split by taking ~10% of the shuffled train files (in detail, 140 out of 1399).