Closed alberbohar closed 4 years ago
For a fixed fold id we split all tasks in train
, val
, and test
. To do model and hyperparameter selection we trained on train
and tested on val
and aggregated the results over 3 splits to select the best parameters.
The final results are on 10 different folds, i.e., 3 existing + 7 more. We could have trained the best config on the 7 remaining folds. But as dataset is relatively small, we trained 10 models on train+val
from scratch using the best config and evaluated on test
. Each of the models has not seen its test until the final evaluation.
Note, that one cannot take an ensemble of the best models on train/val splits and apply it to test, because test of one model is train of another model.
The function that build the splits is called get_gold
: https://phyre.ai/docs/evaluator.html
create_balanced_eval_set
has nothing to do with split tasks in train and test. It takes a preselected set of task ids (e.g., train set and validation set) and set of actions, and builds a subset of Cartesian product
Hi , I'm a bit confused about the usage in the validation and train sets. First its bee said in the paper:"we use these tuned hyperparameter and train agents on the union of the training and validation sets", so, are you training on the validation set? Second, in the code: https://github.com/facebookresearch/phyre/blob/master/agents/neural_agent.py#L36 There is a function "create_balanced_eval_set", but it's seems like preparing data for the training procedure. I'm trying to understand the boundaries if any, of the train/val sets. Thanks.