hardcode the validation split

Even with a fixed value for dataset.create_validation_split.seed, it looks like the validation split is different from time to time in a not transparent way (@tanikina noticed differences when switching the base model, but not when switching other hyperparameters such as the learning rate...). With this PR, we define a fixed validation split by taking ~10% of the shuffled train files (in detail, 140 out of 1399).

ArneBinder / dialam-2024-shared-task

hardcode the validation split #33