cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
https://cambrian-mllm.github.io/
Apache License 2.0
1.4k stars 88 forks source link

HF transformers support #17

Open Iven2132 opened 5 days ago

Iven2132 commented 5 days ago

It would be cool if this model could have transformer support, also what is the specialty of this model? what is something that this model is good at?

ellisbrown commented 2 days ago

@Iven2132 do you mean integration into HF transformers itself?

this may be a bit challenging to support both GPU & TPU, but is something we may investigate later.

our current code depends on HF transformers and should be able to used with it.

we specifically target vision-centric capabilities, but our model is general-purpose. see more info on our site or in the paper https://cambrian-mllm.github.io/

Iven2132 commented 1 day ago

@ellisbrown Yes, I mean in the HF transformers itself. What are target vision-centric capabilities? Can It write code from a given UI etc?

ellisbrown commented 19 hours ago

We didn't target generating code from a UI specifically. You can certainly try, but no guarantees there.

As for vision-centric capabilities: have a look at the benchmarks that we classify as "vision-centric" for a better idea—MMVP, Real World QA, and the CV-Bench we introduced.

You can read more about our CV-Bench benchmark in section 3.2. We test 4 different vision-centric capabilities. image