Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
7.95k stars 797 forks source link

falcon-40b out of memory #165

Closed lynngao closed 10 months ago

lynngao commented 1 year ago

Hi! I am trying to finetne falcon-40b with a single A100 GPU of 80GB memory. I tried with decreasing the micro batch size to be 1 however it is still OOM for both adapter_v2 and lora with bfloat16-mixed / fp-16. Any suggestion on how to solve this issue without using multiple GPU? Thanks a lot!

lynngao commented 1 year ago

I tried again with 2 A100 GPUs but still OOM. Set device = 2 and tried both lora and adapter_v2. Any help would be appreciated!

carmocca commented 12 months ago

Falcon 40B won't fit in a single 80GB card.

I will report back when I find out what's the minimum memory requirement to fine-tune it. But I don't have access to A100 80GB right now

gpravi commented 12 months ago

Any luck with finetuning? Running into OOM while trying to fine tune Falcon40B on a 8 GPU A100 80 GB machine. Tried reducing num_devices, micro_batch_size, lower lora rank.

Update: Looks like the recent main don't support multi GPU training. Any plans/threads to support that feature?

carmocca commented 12 months ago

@gpravi Distributed support for LoRA is tracked in #161

weilong-web commented 11 months ago

So currently it's not possible to finetune Falcon 40B using Lit-parrot, right?

gpravi commented 11 months ago

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - https://github.com/Lightning-AI/lit-gpt/issues/198

lynngao commented 11 months ago

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198

I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune

gpravi commented 11 months ago

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

lynngao commented 11 months ago

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

No I only tried the 4bit version.

alexeiga commented 11 months ago

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198

I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune

Were you able to run it in DPP or only single GPU?

alexeiga commented 11 months ago

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

downgrading bitsandbytes to 0.37.2 worked for me (took me a few days to find this thread..) https://github.com/TimDettmers/bitsandbytes/issues/324

gpravi commented 11 months ago

@alexeiga Nice. Can you please let us know the configurations?

Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?

alexeiga commented 11 months ago

@alexeiga Nice. Can you please let us know the configurations?

Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?

i tried to, but without success... was only able to go single gpu, but training is VERY slow.

carmocca commented 11 months ago

QLoRA finetuning support is tracked in #176. Until that is supported, you can try the suggestions described in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md