Open CSEEduanyu opened 4 months ago
Hi, could you please provide more information about your case (e.g. the device_map
for loading and the number of GPUs you are using). Also, can you try to load the 8/13b model to see whether the same problem happens?
transformers in my env is 4.39 , why must transformers==4.37.0 in dependencies ?
all dependencies is "== " , I wonder if ">" is OK?
Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about device_map
for loading and the number of GPUs you are using? Also, what is the version of your accelerate
?
Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about
device_map
for loading and the number of GPUs you are using? Also, what is the version of youraccelerate
?
A100*8
Loading checkpoint shards: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 30/32 [00:19<00:01, 1.57it/s] Traceback (most recent call last): model = CambrianLlamaForCausalLM.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained ) = cls._load_pretrained_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 807, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.
when i add some log , is loding "model.mm_projector_aux_0.0.weight" @penghao-wu
Is it because I only kept the second one in mm_vision_tower_aux_list?
Is it because I only kept the second one in mm_vision_tower_aux_list?
What do you mean by this? You don't need to modify the config if you want to load our trained model.
Is it because I only kept the second one in mm_vision_tower_aux_list?
What do you mean by this? You don't need to modify the config if you want to load our trained model.
because i can only load local path model,Can you list the huggingface download addresses for these four vision models?
"mm_vision_tower_aux_list": [ "siglip/CLIP-ViT-SO400M-14-384", "openai/clip-vit-large-patch14-336", "facebook/dinov2-giant-res378", "clip-convnext-XXL-multi-stage" ],
For example, CLIP-ViT-SO400M-14-384 seems to have many versions, and I can't search clip-conv-xxL-multi-stage in huggfing face
CLIP-ViT-SO400M-14-384
should be hf-hub:timm/ViT-SO400M-14-SigLIP-384
and clip-conv-xxL-multi-stage
should be hf-hub:laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup
. If you use local path, you might need to look into the loading code for each of the vision encoders in the cambrian/model/multimodal_encoder
folder to ensure the correctness.
Hi, how can I set 2 48G gpus?
2024-06-30 15:21:12 PID=57 init.py:49 setup_logging() INFO → 'standard' logger initialized.
2024-06-30 15:21:13 PID=57 model_worker.py:274
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) TypeError: not a string
This error seems not related to multiple GPUs. Make sure that all model files are downloaded correctly (e.g. tokenizer.model
)
@dionren Some of the vision encoders are not from transformers
and do not support device_map
, so there are some problems setting device_map=auto
using multiple GPUs. And we are still working to convert the vision encoders to support this.
But I have a workaround for your case with 2 48G gpus. This includes the following modifications:
cambrian/model/builder.py
from accelerate import infer_auto_device_map, dispatch_model
def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", kwargs): device_map='sequential' kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, kwargs}
2. Change https://github.com/cambrian-mllm/cambrian/blob/9d382223ba3e0ab9f99bad4f45c0fd4a21749dc6/cambrian/model/language_model/cambrian_llama.py#L252 to
```cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)```
@dionren Some of the vision encoders are not from
transformers
and do not supportdevice_map
, so there are some problems settingdevice_map=auto
using multiple GPUs. And we are still working to convert the vision encoders to support this.But I have a workaround for your case with 2 48G gpus. This includes the following modifications:
- Modify the beginning of
cambrian/model/builder.py
from accelerate import infer_auto_device_map, dispatch_model def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs): device_map='sequential' kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, **kwargs}
- Change https://github.com/cambrian-mllm/cambrian/blob/9d382223ba3e0ab9f99bad4f45c0fd4a21749dc6/cambrian/model/language_model/cambrian_llama.py#L252 to
cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)
I'm gonna try it out. Thanks a ton for your help and the awesome work you've done. It's truly impressive.
Is it possible to load cambrian-34B on 8 RTX4090?
in load_pretrained_model model = CambrianLlamaForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3531, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.