question - Githubissues

dvtailor / meta-l2d

Code for 'Learning to Defer to a Population: A Meta-Learning Approach' (AISTATS 2024)

MIT License

2 stars 0 forks source link

question #4

Open Xiaozhi-sudo opened 1 month ago

Xiaozhi-sudo commented 1 month ago

Dear author, I have two questions to ask you. The first question is, when I read the code, I found that you used test data to select hyperparameters in both validation and fine-tuning methods. Can test data be used to select hyperparameters? The second problem is that some of the chart data in the paper seem to be inconsistent. For example, for the CIFAR10 dataset, the system accuracy of the NP method in Figure 3 is about 93% when p=0.8, while the system accuracy of the NP method in Figure 7 of the appendix is about 90% when p=0.8. Are these figures based on different experimental data?

dvtailor commented 1 month ago

Hi, thanks for your interest in our paper.

All figures report performance on the test set. A (separate) validation set is used to tune the step size and number of steps in the test-time finetuning approach ("L2D-Pop (finetune)"). If this does not address your query, could you point to where in the code you had an issue with?
The experimental setup in figures 3 and 7 is the same except that the networks in figure 7 do not use batch normalization -- see App. C for more details. This is why the reported performance is lower.

Xiaozhi-sudo commented 1 month ago

Thank you for your answer. On the first question, I mean that during each epoch in the training phase, the evaluate() function passed expert_test during validation. Shouldn't we pass in expert_train, since we don't have access to test experts during training