GeorgeCazenavette / mtt-distillation

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
https://georgecazenavette.github.io/mtt-distillation/
Other
395 stars 55 forks source link

How did you get x̄ ± s in table 1 #23

Closed NiaLiu closed 1 year ago

NiaLiu commented 1 year ago

Hi George,

Thanks for your great work, and sorry to bother you again.

I have another question regarding the accuracy value shown in table 1. I assume there are two possible ways to get those numbers. 1, train synthetic data for a certain number of steps (e.g 9000 steps), then test the accuracy on the test dataset. 2, test on test dataset at every 100 steps of training on a synthetic dataset, then take a maximum accuracy.

The second way is not valid since the test dataset should only be used one time in the end. So did you use the first method to get the accuracy? If so, how many steps did you take?

Thank you, and hope you have a great day! Dai

GeorgeCazenavette commented 1 year ago

Hi Dai,

If I remember correctly (and it's been a while now), we evaluated on the test set every 1000 iterations and reported the max.

At the time, we thought of this as early stopping.

You are correct in that a more proper way to do this would have been to evaluate on a held-out validation set and then evaluate the best performing checkpoint on the test set.

For what it's worth though, there wasn't much variation as time went on once the peak performance was reached:

W B Chart 12_6_2022, 5_38_27 PM

In fact, we may have stopped some of the runs too early (like the middle curve).

Hope this helps!

NiaLiu commented 1 year ago

Thanks for your plots and detailed explanations.

Am I correct that you did 10,000 steps and evaluated at each 1000 steps?

And I do have one question in addition, did you also evaluate the cross-architecture performance with ZCA as well?

Really appreciate your time and efforts! Thank you, Dai