Closed augustinharter closed 3 years ago
Hi @augustinharter ,
Do we understand correct that in each cross-template fold there are 16 training templates and the remaining 9 templates (dev, test) (link) are used for measuring the auccess?
Table 1 represent final results. For final results we trained on train+dev
with hyperparameters obtained from preliminary experiments and tested on test
. For preliminary experiments we trained all models on train
and evaluated on dev
.
We found that template 6 was never part of the test templates in the 10 cross-template folds provided in the baseline code (link to 'fold method'), thus, template 6 does not contribute to the final cross-template auccess value. Is this intended or that just happened to be?
That's a very interesting observation! It was not planned to be so, we use random allocation and apparently some tasks got luckier than others.
Hi,
we have a question about the baseline results for cross template cases (PHYRE paper, Table 1, PHYRE-B, DQN, Cross).
Do we understand correct that in each cross-template fold there are 16 training templates and the remaining 9 templates (dev, test) (link) are used for measuring the auccess?
We found that template 6 was never part of the test templates in the 10 cross-template folds provided in the baseline code (link to 'fold method'), thus, template 6 does not contribute to the final cross-template auccess value. Is this intended or that just happened to be?