BorealisAI / DT-Fixup

Optimizing Deeper Transformers on Small Datasets https://arxiv.org/abs/2012.15355
15 stars 10 forks source link

About the experimental results #4

Open huybery opened 3 years ago

huybery commented 3 years ago

I directly ran the code of the code base without any modification. The results are as follows

08/28/2021 06:50:45 [Epoch 100] dev acc: 0.70696 (took 220s) 08/28/2021 06:50:45 checkpoint: tmp/dtfixup 08/28/2021 06:50:45 best dev accuracy: 0.72340 08/28/2021 06:50:45 checkpoint: tmp/dtfixup

The best dev accuracy is only 72.3%, Maybe I missed something? For the Experiment Configuration, I found that the batch in the code is 32 and the batch in the paper is 16. Is this the reason for my failure?

billy-inn commented 3 years ago

In practice, the experimental results have some variances. The best results reported/submitted in the paper is the best one from 5 runs.

huybery commented 3 years ago

Thanks for your rely, I will run more times.