-
The code you gave in render_robot_pyrender.py file may have some minor problems, but it can be executed normally. partnet_label.py also has many errors. First of all, the package from handal_label imp…
-
With [ml-ferret](https://github.com/apple/ml-ferret) out, it would be great to include an MLLM example in this repo, namely with ml-ferret or just LlaVA itself. Being LLAMA based, I think this would …
-
Curious if MLLMs can work on it. I am already supposing LLAMA V1.5 can't . I can suggest checking out more efficient MLLM models like X-LLM
-
All the question prompts are extracted from DocStruct4M, 'multi_grained_text_localization.jsonl' as below,
```
[
"Give the bounding box of the text",
"Predict the bounding box of the text",
…
-
Dear CogVLM's authors,
Thank you for your outstanding work on MLLM.
Can you share a bit about estimating the time required to fine-tune or train the model?
```
Hardware requirement
Model In…
-
Hello,
As I was meticulously reading a paper, I found myself confused about the section on 'projectors.'
Background: From what I understand so far, in the case of CLIP ViT Large, despite the com…
-
如题。。如果pretrain就把图片切那么多份,训练成本是不是有些cover不住
-
![image](https://github.com/mini-sora/minisora/assets/8240984/0d4df698-a324-466b-911d-f561160c5a8c)
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Efficient Large Languag…
-
Can EVA-CLIP-8B and EVA-CLIP-18B support quantization? My device doesn't have such high specifications, and I'm worried I won't be able to run these models. My device currently has only a little over …
-
I evaluated LLaVA-1.5-7b on the MMVP dataset and found that its accuracy is 60.0%, which is significantly higher than the 24.7% reported in Table 3.
Upon comparing the evaluation code, I discovered t…