Closed NiaLiu closed 1 year ago
Hi Dai,
If I remember correctly (and it's been a while now), we evaluated on the test set every 1000 iterations and reported the max.
At the time, we thought of this as early stopping.
You are correct in that a more proper way to do this would have been to evaluate on a held-out validation set and then evaluate the best performing checkpoint on the test set.
For what it's worth though, there wasn't much variation as time went on once the peak performance was reached:
In fact, we may have stopped some of the runs too early (like the middle curve).
Hope this helps!
Thanks for your plots and detailed explanations.
Am I correct that you did 10,000 steps and evaluated at each 1000 steps?
And I do have one question in addition, did you also evaluate the cross-architecture performance with ZCA as well?
Really appreciate your time and efforts! Thank you, Dai
Hi George,
Thanks for your great work, and sorry to bother you again.
I have another question regarding the accuracy value shown in table 1. I assume there are two possible ways to get those numbers. 1, train synthetic data for a certain number of steps (e.g 9000 steps), then test the accuracy on the test dataset. 2, test on test dataset at every 100 steps of training on a synthetic dataset, then take a maximum accuracy.
The second way is not valid since the test dataset should only be used one time in the end. So did you use the first method to get the accuracy? If so, how many steps did you take?
Thank you, and hope you have a great day! Dai