JonasGeiping / cramming

Cramming the training of a (BERT-type) language model into limited compute.
MIT License
1.29k stars 100 forks source link

Determinist and update train.scheduler #47

Closed BoyuanFeng closed 3 months ago