Closed hanhanpp closed 3 weeks ago
Using the following command:
CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=true python -m sft.finetune --model GreenBitAI/Llama-3-8B-layer-mix-bpw-2.2 --tune-qweight-only --galore --galore-rank 64 --optimizer adamw8bit --batch-size 1 --seqlen 96
I can fine-tune on a single RTX3090 with 24GB of GPU memory. You can try to adjust your configuration based on this, such as using bpw3 instead of bpw2.2; using diodemix optimizer (16bit) instead of adamw8bit; removing the option related to --galore*, because galore will slow down the process when recalculating low rank matrices. Use a larger seqlen or batch size, etc.
Thanks for your reply!
Hi, I try to finetune llama-3 8B bpw_3 on a A100/40GB, but it is out of memory. How much memory dose it need? or which model can be finetuned on a single GPU card?