TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models
https://arxiv.org/abs/2402.14289
Apache License 2.0
662 stars 69 forks source link

Config for TinyLLaVA-OpenELM-270M-SigLIP-0.55B #70

Open eternalding opened 6 months ago

eternalding commented 6 months ago

Greetings. Would like to ask two questions about TinyLLaVA-0.55B (With OpenELM-270M-Instruct): 1) From config.json provided in TinyLLaVA 0.55B's hf repo, seems it is using OpenELM-450M 2) The repo title said it uses SigLIP for vision encoder, but config said it uses clip-vit-base-patch16.

Not sure if it's some kind of typo.

Thanks.

Link to TinyLLaVA 0.55B's config: https://huggingface.co/jiajunlong/TinyLLaVA-OpenELM-270M-SigLIP-0.55B/blob/main/config.json

jiajunlong commented 5 months ago

Thanks for your reminder, TinyLLaVA-0.55B actually uses OpenELM-450M-Instruct as the LLM and clip-vit-base-patch16 as the VisionTower. The config.json file in the Huggingface repository is correct. I have updated the description in the Huggingface repository. Thank you very much for pointing out the error.

eternalding commented 5 months ago

Okay. Thanks for the correction.

ggcr commented 5 months ago

Is there any TinyLLaVA version trained on OpenELM-270M-Instruct by any chance? @jiajunlong

jiajunlong commented 4 months ago

Is there any TinyLLaVA version trained on OpenELM-270M-Instruct by any chance? @jiajunlong

I'm so sorry, I just saw your message. We have tried using the OpenELM-270M-Instruct model, but the results were very poor.