Open felixbur opened 1 year ago
I think the name is already correct, i.e., "test" but consider the following considerations.
The current behavior should work as expected, i.e., performance score could be calculated directly from test set.
If the test has no labels, then the model will concatenate train and dev (split_strategy = train), and the model will predict target given the the audio file from the test set (maybe mark test database with split_strategy = predict ?). No performance score on the test (but instead the output is a prediction file [CSV] with header file
and label
). The user should define another experiment and use "dev" as a "test" split to obtain a score.
@felixbur I think it makes sense. See my two cases above. In case 1, the current 'test' split should be renamed to 'dev' split. In case 2, we need to reintroduce the 'test' split again.
So, consider the following (it is common, e.g., in ComParE challenge). There are three splits given by the authors of the dataset: train.csv
, dev.csv
, and test.csv
. Train and dev have labels, the test does not (in the CSV file, it only contains a file, the labels (e.g.
emotion), are usually replaced by a question mark.
So there are possibilities for building a model:
train
only, evaluation metric for trainingtrain
+ dev
, evaluation metric for devtrain
+dev
for training and test
, evaluation metric for trainingThe output in the last option will be a CSV file containing a file and prediction of labels. This file usually is submitted to the organizer to obtain the score of the test set.
A simple workaround maybe just keep the current test as it is but providing more option if test has label (default) to differ where is it dev (has label) or test (unseen).
[DATA]
test.has_labels = False
By default, it assumed test has labels (test.has_labels = True
), if it hasn't, so the output is the prediction from the model (CSV file containing file and target).
@felixbur
After realizing that Nkululeko already has .test
and .demo
modules, this proposal absolutely makes sense (renaming test to dev).
One suggestion point that after getting the best model, user will be allowed to use both train
and dev
data to train
so the final model contains more training data.
Nkululeko only knows two splits: train and test. but it would be more correct to name the "test" split "dev" (short for development), as we kind of always use it to optimize a model. Any thoughts?