Open SaMMyCHoo opened 1 month ago
Hello! Thanks for your attention.
mm_hidden_size
is the hidden size of visual encoder (CLIP ViT).
hidden_size
is the hidden size of LLM (vicuna).
Please add these lines to config.json of the pretrained LLM folder:
"mm_hidden_size": 1024,
"mm_projector_type": "mlp2x_gelu",
"mm_resampler_type": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"mm_vision_tower": "./ckpt/clip-vit-large-patch14",
(You can set mm_vision_tower
to any appropriate local path or url path)
Thanks!
Hi there, I'm very interested in your work, and I am trying to train the model from the beginning following the instruction. However, I'm having trouble running the code, it says "AttributeError: 'VStreamConfig' object has no attribute 'mm_hidden_size'. Did you mean: 'hidden_size'?"
After encountering this error, I followed the advice, change the mm_hidden_size into hidden_size. Then I'm encountering: "TypeError: build_vision_projector() missing 1 required positional argument: 'input_dim'". Now I have no idea how to solve this.
I'm wondering if you could provide some help. I'd appreciated it if you could reply as soon as possible. Best regards.