**Training script** - Githubissues

CodedotAl / gpt-code-clippy

Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57

Apache License 2.0

3.29k stars 220 forks source link

Training script #23

Open arampacha opened 3 years ago

arampacha commented 3 years ago

[x] - add bf16 support
[ ] - check if training with bf16 weights works fine
[x] - add resuming from ckpt
[x] - add wandb tracking
[x] - complete adafactor option
[x] - figure out how to best utilize profiler for training loop optimization
[x] - add gradient accumulation
[x] - support iterable datasets and max_steps argument
[x] - prefetch generator for dataloader

arampacha commented 3 years ago

Casting weights to bf16 is not recommended and removed for now.

shpotes commented 3 years ago

here's the gradient accumulation from the vision_transformer codebase: https://github.com/google-research/vision_transformer/blob/ba9a85bdc430daf4da7b9da67b486a4e0f5bb278/vit_jax/hyper.py#L77

And here's a small example https://github.com/google-research/vision_transformer/blob/ba9a85bdc430daf4da7b9da67b486a4e0f5bb278/vit_jax/train.py#L63-L66

mrinal18 commented 3 years ago

for gradient accumulation, i have opened a PR: https://github.com/ncoop57/gpt-code-clippy/pull/29 let me know if we can sync up for this

celsofranssa commented 3 years ago

Hello, what are the minimum hardware requirements to run the training script?

arampacha commented 3 years ago

Hi @celsofranssa, the hyperparameters in HF model cards (for example here) are tuned for TPU-v3-8. But you can run the script on GPU adjusting batch size accordingly and mb switching dtype from bfloat16 to float16 for your hardware. Not sure what the minimum requirement would be exactly. You can also consider decreasing block_size if you run out of memory.

CodedotAl / gpt-code-clippy

**Training script** #23

Training script #23