Open Iven2132 opened 4 months ago
@Iven2132 do you mean integration into HF transformers itself?
this may be a bit challenging to support both GPU & TPU, but is something we may investigate later.
our current code depends on HF transformers and should be able to used with it.
we specifically target vision-centric capabilities, but our model is general-purpose. see more info on our site or in the paper https://cambrian-mllm.github.io/
@ellisbrown Yes, I mean in the HF transformers itself. What are target vision-centric capabilities? Can It write code from a given UI etc?
We didn't target generating code from a UI specifically. You can certainly try, but no guarantees there.
As for vision-centric capabilities: have a look at the benchmarks that we classify as "vision-centric" for a better idea—MMVP, Real World QA, and the CV-Bench we introduced.
You can read more about our CV-Bench benchmark in section 3.2. We test 4 different vision-centric capabilities.
@ellisbrown I'm still confused I did some visual answering with the 34b model and it performs very badly in that. The only model that passes that question are Gemini 1.5 pro/flash, gpt4-o, and Claude.
Then how Cambrian was evaluated?
It would be cool if this model could have transformer support, also what is the specialty of this model? what is something that this model is good at?