cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
https://cambrian-mllm.github.io/
Apache License 2.0
1.4k stars 88 forks source link

Support Of Multiple Images Infering #4

Closed caichuang0415 closed 5 days ago

caichuang0415 commented 1 week ago

Thanks for your great work, and I've tried it and I am surprised by its good performance. But the inference script you provide now only supports one image a time inferring, I hope you upgrade it so that it can go multiple images inferring. Besides, it would be better if it supports in-context dialog

ellisbrown commented 5 days ago

Hi @caichuang0415 thanks for your interest in our work!

Unfortunately, Cambrian-1 only supports single-image inference. Multiple image support would require developing a separate model and new training data, and so we do not have plans to do this immediately. This is something that we may investigate down the road though!