Closed Hzj199 closed 3 years ago
hmm, good question
in all our experiments, we kept the training params the same for all baselines (using the same epochs)
we didn't tune the optimization params to evaluate the impact, but this might be worth exploring
thanks!
Thanks for sharing the code. Does mixstyle need increase the training epochs like mixup?