Closed zhaosheng-thu closed 1 week ago
Hm, it could be related to the slightly larger size.
llama2-7b by the lit-llama
I think llama 2 is not supported by lit-llama. Do you perhaps meant llama 7B in lit-llama or llama 2 7B in LitGPT?
If you meant lit-llama, I am curious, does the 7B Llama 2 model work for you in LitGPT?
In any case, you could perhaps try QLoRA or a smaller sequence length to make it work.
With --quantize bnb.nf4
, I am able to fine-tune the Llama 3-8B without any problem on a single A10 GPU.
Thanks for all the help. I found that the OOM error vanishes when I choose a smaller max-seq-length
. I believe it's because my dataset samples are too long, leading to OOM. When I tried Lora with the Alpaca-2k dataset, it consumed 20.5GB of memory. When I used my dataset without limiting max_seq_length
, it would OOM regardless of whether I used --quantize bnb.nf4
or not. The issue was resolved when I limited --max-seq-length 512
.
When I finetune Llama3-8b by
finetune/lora.py
, OOM occured. My training and dataset parameters:The parameters and the config
```bash > --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B \ > --precision 'bf16-true' \ > --train.global_batch_size 8 \ > --train.max_seq_length 2048 \ > --data JSON \ > --data.prompt_style 'llama3' \ > --data.json_path /root/szhao/ES-Lora/litllama/ExTES/ExTES.json \ > --data.val_split_fraction 0.1 \ > --data.mask_prompt True \ > --out_dir out/llama3-esconv-test ```In the Command, the prompt_style 'llama3' is defined by myself. But I encountered the Error as following:
The Error showed in terminal
{'checkpoint_dir': PosixPath('checkpoints/meta-llama/Meta-Llama-3-8B'), 'data': JSON(json_path=PosixPath('/root/szhao/ES-Lora/litllama/ExTES/ExTES.json'), mask_prompt=True, val_split_fraction=0.1, prompt_style=I find it so weird because formally I have finetune llama2-7b by the lit-llama repository on the same dataset with almost same train config, at that time everything went smoothly. Can you help me? Thanks.