-
The idea of this work is very interesting!
While I have two confusions about the method:
(1) What's the ground truth caption of the image in Fig. 2? Is the word "feather" correct? (I am not sure…
-
Recently, som MLLMs hava adapted hermes2_yi34b as base language model, such as [InternVL](hermes2_yi34b), [LLava](https://github.com/haotian-liu/LLaVA) . Have your team applied it to the project, lik…
-
I evaluated LLaVA-1.5-7b on the MMVP dataset and found that its accuracy is 60.0%, which is significantly higher than the 24.7% reported in Table 3.
Upon comparing the evaluation code, I discovered t…
-
Thanks for the great effort of this repo! I see you provide the zero-shot results of several MLLMs on ScienceQA-IMG dataset. Could you please add the detailed results (i.e., NAT, SOC, LAN) of the TEST…
-
### Feature request / 功能建议
Dear CogVLM's authors,
Thank you for your outstanding work on MLLM.
In the demo, we can only query pictures. Is it possible to make the model process pdf files?
### Mot…
-
训练脚本
```
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type glm4v-9b-chat \
--model_id_or_path /MLLM/new_models/ZhipuAI/glm-4v-9b \
--dataset /data_archive…
-
Hi, thank you for your implementation.
While I'm viewing your code lines, a question arises about the 'masked loss.'
Why do you mask out the last part of each loss using this function?
https:…
-
Hi,
thank you for this great work!
In Table 1 of your paper, accuracy improvement is reported by adding S2 Scaling to LLaVA. As shown in Figure 1, the channel dimension of S2 Scaling is double …
-
-
作者你好,我在尝试使用e5-v, 在长文本检索的场景中,确实看到比较好的效果,但我在尝试复现时,发现效果和论文中没对齐。
实验实在Flickr30K上进行实验的,以下时实验结果
### 开放的权重e5-v
测试效果如下:
'image_retrieval_recall@1 | 'image_retrieval_recall@5 | 'image_retrieval_recall@10
-…