Cross-template folds never contain template 6 in test or dev template sets (10 folds)

facebookresearch / phyre

PHYRE is a benchmark for physical reasoning.

Apache License 2.0

430 stars 61 forks source link

Hi @augustinharter ,

Do we understand correct that in each cross-template fold there are 16 training templates and the remaining 9 templates (dev, test) (link) are used for measuring the auccess?

Table 1 represent final results. For final results we trained on train+dev with hyperparameters obtained from preliminary experiments and tested on test. For preliminary experiments we trained all models on train and evaluated on dev.

We found that template 6 was never part of the test templates in the 10 cross-template folds provided in the baseline code (link to 'fold method'), thus, template 6 does not contribute to the final cross-template auccess value. Is this intended or that just happened to be?

That's a very interesting observation! It was not planned to be so, we use random allocation and apparently some tasks got luckier than others.

facebookresearch / phyre

Cross-template folds never contain template 6 in test or dev template sets (10 folds) #40