JonasGeiping / cramming

Cramming the training of a (BERT-type) language model into limited compute.
MIT License
1.3k stars 100 forks source link

Fix bug in batch size schedule #20

Closed JeanKaddour closed 1 year ago

JonasGeiping commented 1 year ago

Oh no, I never upstreamed this fix. Thanks!