JonasGeiping / cramming

Cramming the training of a (BERT-type) language model into limited compute.
MIT License
1.29k stars 100 forks source link

Errors with both the verify installation command as well as the final recipe #27

Closed tatami-galaxy closed 1 year ago

tatami-galaxy commented 1 year ago

After cloning and installation this command :

python pretrain.py name=test arch=bert-base train=bert-base data=sanity-check-2 dryrun=True impl.microbatch_size=2

produces "In 'cfg_pretrain': Could not find 'arch/bert-base'". If I replace the arch argument with train/hf-bert-tiny I get :

"FileNotFoundError: Directory /root/cramming/outputs/data/sanity-check-2_BPEx32768_aa4b98dc480e637aa82f59461e1b1729 not found"

If I try the final recipe : python pretrain.py name=amp_b8192_cb_o4_final arch=crammed-bert train=bert-o4 data=pile-readymade

I get "RuntimeError: Unexpected optimization option max_autotune_gemm"

JonasGeiping commented 1 year ago

Hi! Thanks for trying the new version. Sorry, the installation command still refered to the older version, this is fixed now.

For the dataset, you should be warned that impl.forbid_dataset_preprocessing=True is set, and no new dataset is generated. Just, in case, I've flipped that flag to a default state of False now.

Finally, for Unexpected optimization option max_autotune_gemm, what PyTorch version do you have. This variant of the inductor should have been merged by now, but maybe this is still only in the nightlies.

In any case, you can disable this setting via impl._inductor_vars=null.

tatami-galaxy commented 1 year ago

Thanks it works now 👍🏽