-
Hi, we recently finished a paper "[Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond](https://arxiv.org/abs/2310.02071)" ,we t…
-
Environment
GPUs: 8x4090
**Package Version**
Package Version
------------------------ ---------------------
absl-py 2.1.0
aiohttp 3.9.3
ai…
-
Thanks for your great work!
1. have you tested _instructblip-flant5_ based on _CHEF_? For the same task, why the result of flant5 is quite different from vicuna? For example, with "SRC/config/ChEF/sc…
-
```diff
--- a/cj3.txt
+++ b/cj3.txt
@@ -30598,7 +30598,7 @@ nhytg 䂌
nic 䥒
nif 㢱
nij 㚈
-nimnb 㣇
+smhhb 㣇
njbc 㣀
nkbr 㢠
nkf 㷺
@@ -33021,6 +33021,7 @@ vmfj 㛁
vmfm…
-
Firstly, thank you for your contributions to the multi-modal large language model (MLLM) research with MiniGPT-5. I'm experiencing an issue while testing the model's image comprehension capabilities.
…
-
### Describe the issue
Issue: As shon in this [issue](https://github.com/haotian-liu/LLaVA/issues/62), the training loss in coonvergence should be lower than 2 for `llava-vicuna-chat-hf-pretrain`. Ho…
-
1. There are error in "The text-only loss corresponds to training only on training only RefinedWeb", double " training only "
2. which dataset is used when "text-only loss, w/o RefinedWeb"
3. Why…
-
I used Lora to fine tune my own dataset, but the model only replied to the content I had trained on, and I didn't know any other common sense content but Bunny-v1_0-2B-zh is ok
Do you have any train…
-
Sorry to bother you in your busy time and i am hurry to cary out alpha-clip with LLaVA-7b-clip.
I followed the instructions in [here](https://github.com/SunzeY/AlphaCLIP/issues/11#issuecomment-186264…
-
Can you provide an example of how to use `accelerate` with the [Hugging Face trainer](https://huggingface.co/transformers/master/main_classes/trainer.html#id1)?