CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Apache License 2.0
3.29k stars 220 forks source link

**Training script** #23

Open arampacha opened 3 years ago

arampacha commented 3 years ago
arampacha commented 3 years ago

Casting weights to bf16 is not recommended and removed for now.

shpotes commented 3 years ago

here's the gradient accumulation from the vision_transformer codebase: https://github.com/google-research/vision_transformer/blob/ba9a85bdc430daf4da7b9da67b486a4e0f5bb278/vit_jax/hyper.py#L77

And here's a small example https://github.com/google-research/vision_transformer/blob/ba9a85bdc430daf4da7b9da67b486a4e0f5bb278/vit_jax/train.py#L63-L66

mrinal18 commented 3 years ago

for gradient accumulation, i have opened a PR: https://github.com/ncoop57/gpt-code-clippy/pull/29 let me know if we can sync up for this

celsofranssa commented 3 years ago

Hello, what are the minimum hardware requirements to run the training script?

arampacha commented 3 years ago

Hi @celsofranssa, the hyperparameters in HF model cards (for example here) are tuned for TPU-v3-8. But you can run the script on GPU adjusting batch size accordingly and mb switching dtype from bfloat16 to float16 for your hardware. Not sure what the minimum requirement would be exactly. You can also consider decreasing block_size if you run out of memory.