-
**FairCLIP: Harnessing Fairness in Vision-Language Learning**
Paper Link: https://arxiv.org/abs/2403.19949
Code Link: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
another paper on A…
-
LLava supports multiple images by default, what if send T,N,D into LLM without any aggregation?
-
The idea of this work is very interesting!
While I have two confusions about the method:
(1) What's the ground truth caption of the image in Fig. 2? Is the word "feather" correct? (I am not sure…
-
### Feature request / 功能建议
Dear CogVLM's authors,
Thank you for your outstanding work on MLLM.
In the demo, we can only query pictures. Is it possible to make the model process pdf files?
### Mot…
-
Thanks for the great effort of this repo! I see you provide the zero-shot results of several MLLMs on ScienceQA-IMG dataset. Could you please add the detailed results (i.e., NAT, SOC, LAN) of the TEST…
-
Recently, som MLLMs hava adapted hermes2_yi34b as base language model, such as [InternVL](hermes2_yi34b), [LLava](https://github.com/haotian-liu/LLaVA) . Have your team applied it to the project, lik…
-
Hi,
thank you for this great work!
In Table 1 of your paper, accuracy improvement is reported by adding S2 Scaling to LLaVA. As shown in Figure 1, the channel dimension of S2 Scaling is double …
-
Hi, thank you for your implementation.
While I'm viewing your code lines, a question arises about the 'masked loss.'
Why do you mask out the last part of each loss using this function?
https:…
-
```diff
--- a/cj3.txt
+++ b/cj3.txt
@@ -30598,7 +30598,7 @@ nhytg 䂌
nic 䥒
nif 㢱
nij 㚈
-nimnb 㣇
+smhhb 㣇
njbc 㣀
nkbr 㢠
nkf 㷺
@@ -33021,6 +33021,7 @@ vmfj 㛁
vmfm…
-
'{"object":"error","message":"Unknown part type: image","type":"BadRequestError","param":null,"code":400}'