Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

Load pretrained weight error #256

Open baibizhe opened 12 months ago

baibizhe commented 12 months ago

running script:

export PYTHONPATH=.
accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
./pipeline/train/instruction_following.py \
--pretrained_model_name_or_path=luodian/OTTER-9B-INIT \
--mimicit_path="/home/ubuntu/works/code/working_proj/otter/data/xxxjson" \
--images_path="/home/ubuntu/works/code/working_proj/otter/data/xxx.json" \
--train_config_path="/home/ubuntu/works/code/working_proj/otter/data/xxx.json" \
--batch_size=4 \
--num_epochs=9 \
--report_to_wandb \
--wandb_entity=ntu-slab \
--run_name=otter9B_dense_caption \
--wandb_project=otter9B \
--workers=1 \
--lr_scheduler=cosine \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01

error information: The current model version is configured for Otter-Image with max_num_frames set to None. Total Trainable param: 1.441012 B Loading checkpoint shards: 0%| | 0/4 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/ubuntu/works/code/working_proj/otter/./pipeline/train/instruction_following.py", line 831, in main() File "/home/ubuntu/works/code/working_proj/otter/./pipeline/train/instruction_following.py", line 552, in main model = OtterForConditionalGeneration.from_pretrained( File "/home/ubuntu/anaconda3/envs/otter/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3091, in from_pretrained ) = cls._load_pretrained_model( File "/home/ubuntu/anaconda3/envs/otter/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3471, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/home/ubuntu/anaconda3/envs/otter/lib/python3.10/site-packages/transformers/modeling_utils.py", line 736, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/home/ubuntu/anaconda3/envs/otter/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 281, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([32003, 4096]) in "weight" (which has shape torch.Size([32004, 4096])), this look incorrect.

Would you mind helping me with this?

5RJ commented 11 months ago

Hi, I meet the same error, have you figured it out?

Luodian commented 11 months ago

OK I see sorry for this misunderstanding.

You should set an arg called 'model_name' to force loading model with FlamingoForConditionalGeneration

The model with 'init' postfix is directly converted from OpenFlamingo and do not have a <answer> token.

5RJ commented 11 months ago

OK I see sorry for this misunderstanding.

You should set an arg called 'model_name' to force loading model with FlamingoForConditionalGeneration

The model with 'init' postfix is directly converted from OpenFlamingo and do not have a '' token.

I've tried and then get this error, could you please have a look at it? https://github.com/Luodian/Otter/issues/251

Luodian commented 11 months ago

Can you try the following code and see if what happens?

from flamingo import FlamingoForConditionalGeneration model = FlamingoForConditionalGeneration.from_pretrained("luodian/OTTER-MPT1B-RPJama-Init")

5RJ commented 11 months ago

Can you try the following code and see if what happens?

from flamingo import FlamingoForConditionalGeneration model = FlamingoForConditionalGeneration.from_pretrained("luodian/OTTER-MPT1B-RPJama-Init")

I follow the comment below: https://github.com/Luodian/Otter/blob/16d73b399fac6352ebff7504b1acb1f228fbf3f4/flamingo/modeling_flamingo.py#L710 and modify the text_config.architectures from null into ["LlamaForCausalLM"] (in config.json in luodian/OTTER-LLaMA7B-Init)

It seems work now: image

linxid commented 11 months ago

Hi, I also meet this error. I should use FlamingoForConditionalGeneration load OTTER-LLaMA7B-Init model instead of OtterForConditionalGeneration? And which is the best model which of size if smaller 10B.

Luodian commented 11 months ago

yes use flamingo with init version model and use Otter for our released trained model.

I think OTTER-MPT-Image version is the best model for now.

gordonhu608 commented 6 months ago

Do you mean we should use model_name as flamingo when training from luodian/OTTER-9B-INIT? This would cause lots of errors in code. I thought in your demo_training.yaml and readmes. We should use model_name as otter when training from luodian/OTTER-9B-INIT. Am I wrong?

Luodian commented 6 months ago

Do you mean we should use model_name as flamingo when training from luodian/OTTER-9B-INIT? This would cause lots of errors in code. I thought in your demo_training.yaml and readmes. We should use model_name as otter when training from luodian/OTTER-9B-INIT. Am I wrong?

Yes, I think if you use flamingo would cause errors, then you could switch to otter if could pass the model loading process.

Otter-9B-Init is the same as OpenFlamingo-9B in archs and weights, the only difference is some special tokens.

In current code, I think I've handle both situations and OtterForConditionalGeneration would add special tokens if not find it in tokenizer and config.json.

gordonhu608 commented 6 months ago

Thank you for your prompt reply ! Somehow I got the kind of selected_index > embedding length error because llama embedding shape is 32002, but otter pad_idx is 32003. I fixed it by set pad_idx to 0 (llama's original setting). Fixing this works for me to training the model (although I don't know why there are no github issue about this or it's only me experiencing it). Now I'm debugging on that after finished training the saved model weight is empty when using --save_hf_model. Any suggestions on this problem? Have you encountered this before? Any suggestions help. Thanks !

gordonhu608 commented 6 months ago

For future anyone has the empty model weight issue. I solved it by referring this issue: https://github.com/microsoft/DeepSpeed/issues/4720. But the simple suggestion is using deepspeed zero2, cause only zero3 has this problem.

iz2late commented 5 months ago

For future anyone has the empty model weight issue. I solved it by referring this issue: microsoft/DeepSpeed#4720. But the simple suggestion is using deepspeed zero2, cause only zero3 has this problem.

save my day! thanks gordon!

iz2late commented 4 months ago

Thank you for your prompt reply ! Somehow I got the kind of selected_index > embedding length error because llama embedding shape is 32002, but otter pad_idx is 32003. I fixed it by set pad_idx to 0 (llama's original setting). Fixing this works for me to training the model (although I don't know why there are no github issue about this or it's only me experiencing it). Now I'm debugging on that after finished training the saved model weight is empty when using --save_hf_model. Any suggestions on this problem? Have you encountered this before? Any suggestions help. Thanks !

Hi Grodon! firstly thanks for your helpful comments! I set pad_idx to 0, and now training the model works fine. However, when I perform inference with the trained model, I find that the output is nonsensical. Have you also encountered this problem? I'm wondering if we also need to set the pad_id or something in the generate function.