-
TRL SFTTrainer supports LLaVA (Large Language and Vision Assistant) as described in the following link [Vision Language Models Explained](https://huggingface.co/blog/vlms)
Is there any plan to rele…
-
**Details of model being requested**
- Model name: Florence-2
- Source repo link: https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de
- Research paper link: https://arxiv…
-
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
https://arxiv.org/pdf/2404.16006
CONTEXTUAL: Evaluating Context-Sensitive Text-Ric…
-
https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary…
-
Hi, @YaoMarkMu
I think this is a fantastic piece of work.
My question is, when I attempted to use the provided weights [https://drive.google.com/file/d/1sBTy8oXeweJg3STbhzBR_5pLcVs1F20q/view?usp=sha…
-
Thanks for your awesome work in model merging! I'm excited about the improvements you achieved compare to other merging methods. However, I saw the individually fine-tuned models still out-perform WEM…
-
- [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890)
- [MEDITRON-70B: Scaling Medical Pretraining for Large Language Models](http…
-
[https://arxiv.org/pdf/2404.06512.pdf](https://arxiv.org/pdf/2404.06512.pdf)
[https://github.com/InternLM/InternLM-XComposer](https://github.com/InternLM/InternLM-XComposer)
### preview
- 건강검진 …
-
Thanks for the repo and models! When trying to run demo.sh with the 34b model (commented and uncommented the relevant lines), I am getting nonsense output (with the example video and prompt):
```
##…
-
Hi,
Thanks for sharing the model and code with us.
I am trying to using Vision Foundation Model for a zero shot classification problem.
It is possible with **OpenGVLab/InternVL-14B-224px** bu…