deepcs233 / Visual-CoT

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Apache License 2.0
134 stars 7 forks source link

Release Model Weights after Pretrain Stage #4

Open LengSicong opened 5 months ago

LengSicong commented 5 months ago

Hi authors, congrats on this great work!

May I know if you can release the checkpoints of Visual-CoT after the pre-training stage? So that we can follow by fine-tuning on those ckpts.

deepcs233 commented 5 months ago

Hi! We just use the official LLaVA-1.5's stage1 checkpoints. Pretraining stage only finetunes (mlp-based) projector, and the llm weight is from Vicuna-1.5. You can download the projector's weight here: https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md#projector-weights