Closed babla9 closed 6 months ago
its not enough, need 70G memory to tune
Just for reference, I have tried the finetuning code on a setup with 4 x 24 GB GPUs and it didn't work either, even loading the newly released 4-bit model as base. It loads the model (occupies like 50% of each GPU), but as soon as it tries to load the first batch of data it OOMs.
Hello. I tried some Zero 3 configs yesterday, no luck either.
The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.
Hello. I tried some Zero 3 configs yesterday, no luck either.
The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.
Sad to tell the model is not support zero3 as I mention in ReadME
Hello. I tried some Zero 3 configs yesterday, no luck either. The finetuning process doesn't OOM with just 1 x A100, the problem is the VRAM per GPU. With 4 x 3090 it doesn't work, zero2, zero3 and even trying to start from the new 4-bit model.
Sad to tell the model is not support zero3 as I mention in ReadME
It's a pity that the fine-tuning of the CogVLM2 model does not support Zero3. However, I would like to know the difference between a model that supports Zero3 and one that does not, or how I should modify the model to support Zero3. I hope someone can answer this question. I would be very grateful.
its not enough, need 70G memory to tune
- Fore Sure, just change load model function to merge peft ckpt
Excuse me but, 70G do you mean total or each GPU? I just tried to run finetune_demo/peft_lora on a 8 x V100 (32GB each, 256GB in total) machine with no luck, OOM still.
Then I switched to int4 version but it threw me the error below:
[rank7]: Traceback (most recent call last):
[rank7]: File "/raid/cogvlm2/finetune_demo/peft_lora.py", line 346, in <module>
[rank7]: main()
[rank7]: File "/raid/cogvlm2/finetune_demo/peft_lora.py", line 279, in main
[rank7]: outputs = model(
[rank7]: ^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
[rank7]: loss = self.module(*inputs, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/peft/peft_model.py", line 1430, in forward
[rank7]: return self.base_model(
[rank7]: ^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
[rank7]: return self.model.forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 620, in forward
[rank7]: outputs = self.model(
[rank7]: ^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 389, in forward
[rank7]: images_features = self.encode_images(images)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/modeling_cogvlm.py", line 361, in encode_images
[rank7]: images_features = self.vision(images)
[rank7]: ^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 130, in forward
[rank7]: x = self.transformer(x)
[rank7]: ^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 94, in forward
[rank7]: hidden_states = layer_module(hidden_states)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 79, in forward
[rank7]: attention_output = self.input_layernorm(self.attention(attention_input))
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
[rank7]: output = module._old_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B-int4/visual.py", line 40, in forward
[rank7]: out = xops.memory_efficient_attention(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 268, in memory_efficient_attention
[rank7]: return _memory_efficient_attention(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 387, in _memory_efficient_attention
[rank7]: return _memory_efficient_attention_forward(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 403, in _memory_efficient_attention_forward
[rank7]: op = _dispatch_fw(inp, False)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 125, in _dispatch_fw
[rank7]: return _run_priority_list(
[rank7]: ^^^^^^^^^^^^^^^^^^^
[rank7]: File "/root/anaconda3/envs/cogvlm2_finetune_3.11/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 65, in _run_priority_list
[rank7]: raise NotImplementedError(msg)
[rank7]: NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
[rank7]: query : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]: key : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]: value : shape=(2, 9217, 16, 112) (torch.bfloat16)
[rank7]: attn_bias : <class 'NoneType'>
[rank7]: p : 0.0
[rank7]: `decoderF` is not supported because:
[rank7]: attn_bias type is <class 'NoneType'>
[rank7]: bf16 is only supported on A100+ GPUs
[rank7]: `flshattF@v2.5.6` is not supported because:
[rank7]: requires device with capability > (8, 0) but your GPU has capability (7, 0) (too old)
[rank7]: bf16 is only supported on A100+ GPUs
[rank7]: `cutlassF` is not supported because:
[rank7]: bf16 is only supported on A100+ GPUs
[rank7]: `smallkF` is not supported because:
[rank7]: max(query.shape[-1] != value.shape[-1]) > 32
[rank7]: dtype=torch.bfloat16 (supported: {torch.float32})
[rank7]: has custom scale
[rank7]: bf16 is only supported on A100+ GPUs
[rank7]: unsupported embed per head: 112
Does this mean that V100 is sentenced to death for int4? d0_0b
each GPU 70G For BF16 finetune. Need use. 8xA100 or 8xH100, int4 method finetune is not work
Thanks so much for this work!