TencentARC / SmartEdit

Official code of SmartEdit [CVPR-2024 Highlight]
240 stars 8 forks source link

Qformer mm_projector issue #30

Open zjutkarma opened 2 weeks ago

zjutkarma commented 2 weeks ago

Hello, thanks for ur amazing work! I have a problem when running this code, could u help me to solve it?

When I train the script DS_MLLMSD11_train.py, I encountered this error.

  File "/SmartEdit/model/DS_MLLMSD11_model.py", line 243, in load_pretrain_MLLM_alignment
    mm_projector_param = {'weight': weights.pop('mm_projector.weight'), 'bias': weights.pop('mm_projector.bias')}
KeyError: 'mm_projector.weight'

The directory of the SD_QFormer_conversation_33tokens is: /SmartEdit/checkpoints/stage1_CC12M_alignment_7b/embeddings_qformer/checkpoint-150000.bin

In addition, I run the stage1 inference code successfully.

The qformer model trained in the first stage is a 6 block bert-based model. I print the keys in the model weight dict, it seems that the model doesn't contain "mm_projector" item.

And there is another question that I'm confused about, I think the "mm_projector" module only contains in the llava module, its functionality is to convert the image embedding(using vit) in the image latent space into the text latent space. I have no idea why qformer module needs mm_projector module. I think these two are completely different things.

yuzhou914 commented 2 weeks ago

Thanks for your interest in our work. You might be right and maybe this is a small code error when we push on github, while it goes will after modification.

zjutkarma commented 2 weeks ago

Thanks very much for ur reply!! Looking forward to the updated version!

There is another question in this function, I can't align the Llama output and qformer input in the code. weights.pop('lm_head.weight') is a [33, 4096] tensor and the self.config.num_new_tokens is 35(32 + 2 + 1 in previous setting)

# 1. vec2word: Linear(in_features=4096, out_features=32035, bias=False)
        LLaMA_lm_haed = weights.pop('lm_head.weight')
        LLaMA_lm_haed = LLaMA_lm_haed[-self.config.num_new_tokens:]
        self.lm_head.weight.data[-self.config.num_new_tokens:] = LLaMA_lm_haed

Is there anything wrong in my configuration step? I set the configuration file following the original code.