cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
https://cambrian-mllm.github.io/
Apache License 2.0
1.32k stars 85 forks source link

【bug】can not load cambrian-34b #12

Open CSEEduanyu opened 4 days ago

CSEEduanyu commented 4 days ago

in load_pretrained_model model = CambrianLlamaForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3531, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.

penghao-wu commented 3 days ago

Hi, could you please provide more information about your case (e.g. the device_map for loading and the number of GPUs you are using). Also, can you try to load the 8/13b model to see whether the same problem happens?

CSEEduanyu commented 2 days ago

transformers in my env is 4.39 , why must transformers==4.37.0 in dependencies ?

CSEEduanyu commented 2 days ago

all dependencies is "== " , I wonder if ">" is OK?

penghao-wu commented 2 days ago

Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about device_map for loading and the number of GPUs you are using? Also, what is the version of your accelerate?

CSEEduanyu commented 2 days ago

Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about device_map for loading and the number of GPUs you are using? Also, what is the version of your accelerate?

A100*8

CSEEduanyu commented 2 days ago

Loading checkpoint shards: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 30/32 [00:19<00:01, 1.57it/s] Traceback (most recent call last): model = CambrianLlamaForCausalLM.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained ) = cls._load_pretrained_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 807, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.

CSEEduanyu commented 2 days ago

when i add some log , is loding "model.mm_projector_aux_0.0.weight" @penghao-wu

CSEEduanyu commented 2 days ago

Is it because I only kept the second one in mm_vision_tower_aux_list?

penghao-wu commented 2 days ago

Is it because I only kept the second one in mm_vision_tower_aux_list?

What do you mean by this? You don't need to modify the config if you want to load our trained model.

CSEEduanyu commented 2 days ago

Is it because I only kept the second one in mm_vision_tower_aux_list?

What do you mean by this? You don't need to modify the config if you want to load our trained model.

because i can only load local path model,Can you list the huggingface download addresses for these four vision models?

"mm_vision_tower_aux_list": [ "siglip/CLIP-ViT-SO400M-14-384", "openai/clip-vit-large-patch14-336", "facebook/dinov2-giant-res378", "clip-convnext-XXL-multi-stage" ],

CSEEduanyu commented 2 days ago

For example, CLIP-ViT-SO400M-14-384 seems to have many versions, and I can't search clip-conv-xxL-multi-stage in huggfing face

penghao-wu commented 1 day ago

CLIP-ViT-SO400M-14-384 should be hf-hub:timm/ViT-SO400M-14-SigLIP-384 and clip-conv-xxL-multi-stage should be hf-hub:laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup . If you use local path, you might need to look into the loading code for each of the vision encoders in the cambrian/model/multimodal_encoder folder to ensure the correctness.

dionren commented 1 day ago

Hi, how can I set 2 48G gpus?

2024-06-30 15:21:12 PID=57 init.py:49 setup_logging() INFO → 'standard' logger initialized. 2024-06-30 15:21:13 PID=57 model_worker.py:274 () INFO → args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/mnt/cpn-pod/models/nyu-visionx/cambrian-34b', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False) 2024-06-30 15:21:13 PID=57 model_worker.py:66 init() INFO → Loading the model cambrian-34b on worker b48646 ... 2024-06-30 15:21:13 PID=57 builder.py:119 load_pretrained_model() INFO → Loading Cambrian from /mnt/cpn-pod/models/nyu-visionx/cambrian-34b Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/cambrian/cambrian/serve/model_worker.py", line 279, in worker = ModelWorker(args.controller_address, File "/root/cambrian/cambrian/serve/model_worker.py", line 67, in init self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model( File "/root/cambrian/cambrian/model/builder.py", line 120, in load_pretrained_model tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 814, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2029, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2261, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 178, in init self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False)) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor tokenizer.Load(self.vocab_file) File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) TypeError: not a string

penghao-wu commented 1 day ago

return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) TypeError: not a string

This error seems not related to multiple GPUs. Make sure that all model files are downloaded correctly (e.g. tokenizer.model)

penghao-wu commented 1 day ago

@dionren Some of the vision encoders are not from transformers and do not support device_map, so there are some problems setting device_map=auto using multiple GPUs. And we are still working to convert the vision encoders to support this.

But I have a workaround for your case with 2 48G gpus. This includes the following modifications:

  1. Modify the beginning of cambrian/model/builder.py
    
    from accelerate import infer_auto_device_map, dispatch_model

def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", kwargs): device_map='sequential' kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, kwargs}


2. Change https://github.com/cambrian-mllm/cambrian/blob/9d382223ba3e0ab9f99bad4f45c0fd4a21749dc6/cambrian/model/language_model/cambrian_llama.py#L252 to
```cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)```
dionren commented 1 day ago

@dionren Some of the vision encoders are not from transformers and do not support device_map, so there are some problems setting device_map=auto using multiple GPUs. And we are still working to convert the vision encoders to support this.

But I have a workaround for your case with 2 48G gpus. This includes the following modifications:

  1. Modify the beginning of cambrian/model/builder.py
from accelerate import infer_auto_device_map, dispatch_model

def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs):
    device_map='sequential'
    kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, **kwargs}
  1. Change https://github.com/cambrian-mllm/cambrian/blob/9d382223ba3e0ab9f99bad4f45c0fd4a21749dc6/cambrian/model/language_model/cambrian_llama.py#L252 to cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)

I'm gonna try it out. Thanks a ton for your help and the awesome work you've done. It's truly impressive.