A meaningful performance comparison with OpenAI's models

ConnorJL / GPT2

An implementation of training for GPT2, supports TPUs

MIT License

1.42k stars 334 forks source link

A meaningful performance comparison with OpenAI's models #3

Closed lostmsu closed 5 years ago

lostmsu commented 5 years ago

Hi Connor,

I'd like to see some meaningful comparison with released, and, if possible, unreleased OpenAI's pretrained GPT-2 models.

My concern is that if you used different training techniques, the result may be very far off from what they've got. Including a possibility, that 1.5B model could be worse, than 345M model, that they have released.

P.S. Also pinged you on Twitter about this.

lukemelas commented 5 years ago

Could a simple way of doing this be to report the model's perplexity on WikiText-103?

It would be easy to compare these numbers to those in the original paper (26.4/17.5 PPL for 345/1542M).

rusiaaman commented 5 years ago

I tried to approximate the perplexity on Wikitext-103. Got about 59 for PrettyBig model. Colab notebook

ConnorJL commented 5 years ago

I am working with OpenAI to produce accurate comparisons. More information will come soon.

lukemelas commented 5 years ago

Thanks, great to hear!

ConnorJL commented 5 years ago

There are now proper evaluations (done by OpenAI) here