Open zjutkarma opened 2 weeks ago
Thanks for your interest in our work. You might be right and maybe this is a small code error when we push on github, while it goes will after modification.
Thanks very much for ur reply!! Looking forward to the updated version!
There is another question in this function, I can't align the Llama output and qformer input in the code. weights.pop('lm_head.weight') is a [33, 4096] tensor and the self.config.num_new_tokens is 35(32 + 2 + 1 in previous setting)
# 1. vec2word: Linear(in_features=4096, out_features=32035, bias=False)
LLaMA_lm_haed = weights.pop('lm_head.weight')
LLaMA_lm_haed = LLaMA_lm_haed[-self.config.num_new_tokens:]
self.lm_head.weight.data[-self.config.num_new_tokens:] = LLaMA_lm_haed
Is there anything wrong in my configuration step? I set the configuration file following the original code.
Hello, thanks for ur amazing work! I have a problem when running this code, could u help me to solve it?
When I train the script DS_MLLMSD11_train.py, I encountered this error.
The directory of the SD_QFormer_conversation_33tokens is: /SmartEdit/checkpoints/stage1_CC12M_alignment_7b/embeddings_qformer/checkpoint-150000.bin
In addition, I run the stage1 inference code successfully.
The qformer model trained in the first stage is a 6 block bert-based model. I print the keys in the model weight dict, it seems that the model doesn't contain "mm_projector" item.
And there is another question that I'm confused about, I think the "mm_projector" module only contains in the llava module, its functionality is to convert the image embedding(using vit) in the image latent space into the text latent space. I have no idea why qformer module needs mm_projector module. I think these two are completely different things.