Finetuning Questions - Githubissues

babla9 commented 6 months ago

Thanks so much for this work!

What is the minimum GPU requirement to fine-tune the model? Will 2xV100 (32gb) be enough?
Is it possible to run inference via OpenAI API on the peft-finetuned model?

zRzRzRzRzRzRzR commented 6 months ago

its not enough， need 70G memory to tune

Fore Sure， just change load model function to merge peft ckpt

hvico commented 6 months ago

Just for reference, I have tried the finetuning code on a setup with 4 x 24 GB GPUs and it didn't work either, even loading the newly released 4-bit model as base. It loads the model (occupies like 50% of each GPU), but as soon as it tries to load the first batch of data it OOMs.

hvico commented 6 months ago

Hello. I tried some Zero 3 configs yesterday, no luck either.

The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.

zRzRzRzRzRzRzR commented 6 months ago

Hello. I tried some Zero 3 configs yesterday, no luck either.

The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.

Sad to tell the model is not support zero3 as I mention in ReadME

tiandazhao commented 6 months ago

Hello. I tried some Zero 3 configs yesterday, no luck either. The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.

Sad to tell the model is not support zero3 as I mention in ReadME

It's a pity that the fine-tuning of the CogVLM2 model does not support Zero3. However, I would like to know the difference between a model that supports Zero3 and one that does not, or how I should modify the model to support Zero3. I hope someone can answer this question. I would be very grateful.

ailun885757124 commented 6 months ago

its not enough， need 70G memory to tune

Fore Sure， just change load model function to merge peft ckpt

Excuse me but, 70G do you mean total or each GPU? I just tried to run finetune_demo/peft_lora on a 8 x V100 (32GB each, 256GB in total) machine with no luck, OOM still.

Then I switched to int4 version but it threw me the error below:

[rank7]: Traceback (most recent call last):
[rank7]:   File "/raid/cogvlm2/finetune_demo/peft_lora.py", line 346, in <module>
[rank7]:     main()
[rank7]:   File "/raid/cogvlm2/finetune_demo/peft_lora.py", line 279, in main
[rank7]:     outputs = model(
[rank7]:               ^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
[rank7]:     loss = self.module(*inputs, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/peft/peft_model.py", line 1430, in forward
[rank7]:     return self.base_model(
[rank7]:            ^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
[rank7]:     return self.model.forward(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 620, in forward
[rank7]:     outputs = self.model(
[rank7]:               ^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 389, in forward
[rank7]:     images_features = self.encode_images(images)
[rank7]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 361, in encode_images
[rank7]:     images_features = self.vision(images)
[rank7]:                       ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 130, in forward
[rank7]:     x = self.transformer(x)
[rank7]:         ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 94, in forward
[rank7]:     hidden_states = layer_module(hidden_states)
[rank7]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 79, in forward
[rank7]:     attention_output = self.input_layernorm(self.attention(attention_input))
[rank7]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]:     output = module._old_forward(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 40, in forward
[rank7]:     out = xops.memory_efficient_attention(
[rank7]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 268, in memory_efficient_attention
[rank7]:     return _memory_efficient_attention(
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 387, in _memory_efficient_attention
[rank7]:     return _memory_efficient_attention_forward(
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 403, in _memory_efficient_attention_forward
[rank7]:     op = _dispatch_fw(inp, False)
[rank7]:          ^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 125, in _dispatch_fw
[rank7]:     return _run_priority_list(
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 65, in _run_priority_list
[rank7]:     raise NotImplementedError(msg)
[rank7]: NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
[rank7]:      query       : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]:      key         : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]:      value       : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]:      attn_bias   : <class 'NoneType'>
[rank7]:      p           : 0.0
[rank7]: `decoderF` is not supported because:
[rank7]:     attn_bias type is <class 'NoneType'>
[rank7]:     bf16 is only supported on A100+ GPUs
[rank7]: `flshattF@v2.5.6` is not supported because:
[rank7]:     requires device with capability > (8, 0) but your GPU has capability (7, 0) (too old)
[rank7]:     bf16 is only supported on A100+ GPUs
[rank7]: `cutlassF` is not supported because:
[rank7]:     bf16 is only supported on A100+ GPUs
[rank7]: `smallkF` is not supported because:
[rank7]:     max(query.shape[-1] != value.shape[-1]) > 32
[rank7]:     dtype=torch.bfloat16 (supported: {torch.float32})
[rank7]:     has custom scale
[rank7]:     bf16 is only supported on A100+ GPUs
[rank7]:     unsupported embed per head: 112

Does this mean that V100 is sentenced to death for int4? d0_0b

zRzRzRzRzRzRzR commented 6 months ago

each GPU 70G For BF16 finetune. Need use. 8xA100 or 8xH100, int4 method finetune is not work

THUDM / CogVLM2

Finetuning Questions #60