felixbur / nkululeko

Machine learning speaker characteristics

MIT License

31 stars 5 forks source link

renaming "test" split to "dev" #59

Open felixbur opened 1 year ago

felixbur commented 1 year ago

Nkululeko only knows two splits: train and test. but it would be more correct to name the "test" split "dev" (short for development), as we kind of always use it to optimize a model. Any thoughts?

bagustris commented 1 year ago

I think the name is already correct, i.e., "test" but consider the following considerations.

case 1: test has labels

The current behavior should work as expected, i.e., performance score could be calculated directly from test set.

case 2: test has no labels (unseen target)

If the test has no labels, then the model will concatenate train and dev (split_strategy = train), and the model will predict target given the the audio file from the test set (maybe mark test database with split_strategy = predict ?). No performance score on the test (but instead the output is a prediction file [CSV] with header file and label). The user should define another experiment and use "dev" as a "test" split to obtain a score.

bagustris commented 1 year ago

@felixbur I think it makes sense. See my two cases above. In case 1, the current 'test' split should be renamed to 'dev' split. In case 2, we need to reintroduce the 'test' split again.

So, consider the following (it is common, e.g., in ComParE challenge). There are three splits given by the authors of the dataset: train.csv, dev.csv, and test.csv. Train and dev have labels, the test does not (in the CSV file, it only contains a file, the labels (e.g. emotion), are usually replaced by a question mark.

So there are possibilities for building a model:

using train only, evaluation metric for training
using train + dev, evaluation metric for dev
using train +dev for training and test, evaluation metric for training

The output in the last option will be a CSV file containing a file and prediction of labels. This file usually is submitted to the organizer to obtain the score of the test set.

bagustris commented 1 year ago

A simple workaround maybe just keep the current test as it is but providing more option if test has label (default) to differ where is it dev (has label) or test (unseen).

[DATA]
test.has_labels = False

By default, it assumed test has labels (test.has_labels = True), if it hasn't, so the output is the prediction from the model (CSV file containing file and target).

bagustris commented 7 months ago

@felixbur After realizing that Nkululeko already has .test and .demo modules, this proposal absolutely makes sense (renaming test to dev).

One suggestion point that after getting the best model, user will be allowed to use both train and dev data to train so the final model contains more training data.