Unable to reproduce the GPT 2 small results

Hello forum,

I'm trying to reproduce the GPT2 small results using the commands/configurations in the repository. However, the trends are quite different in comparison to the one in the paper/README. For example, the validation curve is expected to provide 2X faster convergence, but I see that the they are almost on top of each other at 10k iterations. The only change from the original commands is that I have --nproc_per_node=4 and batch_size=2 to fit my GPU. I'm copying the validation loss and train loss vs iterations plots here for reference. Note, that the run is still in progress but even the trends are unlike what we expect from the paper/README. I would appreciate any help in enabling me to reproduce the original plots.

Thanks, Poovaiah

Tagging another similar issue #30 here. ValidationLoss Trainloss_without_smoothing Trainloss_with_smoothing

Liuhong99 / Sophia

Unable to reproduce the GPT 2 small results #43