Closed RussellEven closed 2 months ago
use basic_demo/cli_demo_multi_gpus.py
使用的就是basic_demo/cli_demo_multi_gpus.py,还是一样的报错
how to do multimle gpu inference with peft weights as cog
使用的就是basic_demo/cli_demo_multi_gpus.py,还是一样的报错 @zRzRzRzRzRzRzR I use basic_demo/cli_demo_multi_gpus.py,the same error: Traceback (most recent call last): File "/opt/bitmatrix/src/share-serv/serv_misc/src/cg2.py", line 100, in
outputs = model.generate(inputs, gen_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/transformers/generation/utils.py", line 1758, in generate result = self._sample( ^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/transformers/generation/utils.py", line 2397, in _sample outputs = self( ^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 620, in forward outputs = self.model( ^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 389, in forward images_features = self.encode_images(images) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 361, in encode_images images_features = self.vision(images) ^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/visual.py", line 130, in forward x = self.transformer(x) ^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/visual.py", line 94, in forward hidden_states = layer_module(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/visual.py", line 83, in forward output = mlp_input + mlp_output RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:2!
设置成2个卡试试?
设置成2个卡试试?
您好,设置成两张卡,就报显存不足的错误了(22G的显存,且空闲着,不论max_memory_per_gpu设置为多少,都报显存不足)
一样的问题。8张P100,怎么都不行。
same problem
每张显卡分配16G以上,最多三张卡
一样的问题。8张P100,怎么都不行。
P100应该是驱动,算子的问题了,要寻找对应的xformers版本(如果有支持这个卡)
每张显卡分配16G以上,最多三张卡
我使用3张4090成功了
3张2080Ti 22G,还是显存不足 o(╥﹏╥)o
3张4090我也可以成功,但就于多并发任务时的推理速度上不来。仓库up主有没有方法能通过4张或是8张显卡来auto_map一下
可以修改一下device_map, 某一层的权重被分配到不同显卡上了, 比如像这样: 这里在vision.transformer.layers.8下就会出现tensor计算不在同一个device上, 像我举得例子里你可以把layer.8整个改在同一个设备上
3张4090我也可以成功,但就于多并发任务时的推理速度上不来。仓库up主有没有方法能通过4张或是8张显卡来auto_map一下
成功了吗?
一样的问题,感觉是需要用权重被分到不同的卡上了?
出现了相同的问题 求解决
使用的就是basic_demo/cli_demo_multi_gpus.py,还是一样的报错
我也遇到了同样的错误,请问解决了吗
使用的就是basic_demo/cli_demo_multi_gpus.py,还是一样的报错
我也遇到了同样的错误,请问解决了吗
我也是 可以参考:
System Info / 系統信息
system version: Ubuntu 20.04 LTS cuda version: 11.8 python version: 3.10.12 torch version: 2.3.0+cu118 xformers version: 0.0.26.post1+cu118
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
.../huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B/visual.py", line 83, in forward output = mlp_input + mlp_output RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:5 and cuda:6!
_import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoTokenizer from torch.nn.parallel import DistributedDataParallel as DDP import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1, 2, 3, 4, 5, 6, 7" max_memory_mapping = {0: "20GB", 1: "20GB", 2: "20GB", 3: "20GB", 4: "20GB", 5: "20GB", 6: "20GB", 7: "20GB"}
MODEL_PATH = "THUDM/cogvlm2-llama3-chat-19B"
MODEL_PATH = "./cogvlm2-llama3-chat-19B"
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
tokenizer = AutoTokenizer.from_pretrained( MODEL_PATH, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, device_map='auto', max_memory=max_memory_mapping, load_in_8bit=False, torch_dtype=TORCH_TYPE, trust_remote_code=True, ).to(DEVICE).eval()
text_only_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"
while True: image_path = input("image path >>>>> ") if image_path == '': print('You did not enter image path, the following will be a plain text conversation.') image = None text_only_first_query = True else: image = Image.open(image_path).convert('RGB')
Expected behavior / 期待表现
A available multi-gpu run demo in future repo!